Starbucks Capstone Challenge

Introduction

This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks.

Not all users receive the same offer, and that is the challenge to solve with this data set.

Your task is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You'll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.

You'll be given transactional data showing user purchases made on the app including the timestamp of purchase and the amount of money spent on a purchase. This transactional data also has a record for each offer that a user receives as well as a record for when a user actually views the offer. There are also records for when a user completes an offer.

Keep in mind as well that someone using the app might make a purchase through the app without having received an offer or seen an offer.

Example

To give an example, a user could receive a discount offer buy 10 dollars get 2 off on Monday. The offer is valid for 10 days from receipt. If the customer accumulates at least 10 dollars in purchases during the validity period, the customer completes the offer.

However, there are a few things to watch out for in this data set. Customers do not opt into the offers that they receive; in other words, a user can receive an offer, never actually view the offer, and still complete the offer. For example, a user might receive the "buy 10 dollars get 2 dollars off offer", but the user never opens the offer during the 10 day validity period. The customer spends 15 dollars during those ten days. There will be an offer completion record in the data set; however, the customer was not influenced by the offer because the customer never viewed the offer.

Cleaning

This makes data cleaning especially important and tricky.

You'll also want to take into account that some demographic groups will make purchases even if they don't receive an offer. From a business perspective, if a customer is going to make a 10 dollar purchase without an offer anyway, you wouldn't want to send a buy 10 dollars get 2 dollars off offer. You'll want to try to assess what a certain demographic group will buy when not receiving any offers.

Final Advice

Because this is a capstone project, you are free to analyze the data any way you see fit. For example, you could build a machine learning model that predicts how much someone will spend based on demographics and offer type. Or you could build a model that predicts whether or not someone will respond to an offer. Or, you don't need to build a machine learning model at all. You could develop a set of heuristics that determine what offer you should send to each customer (i.e., 75 percent of women customers who were 35 years old responded to offer A vs 40 percent from the same demographic to offer B, so send offer A).

Data Sets

The data is contained in three files:

  • portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
  • profile.json - demographic data for each customer
  • transcript.json - records for transactions, offers received, offers viewed, and offers completed

Here is the schema and explanation of each variable in the files:

portfolio.json

  • id (string) - offer id
  • offer_type (string) - type of offer ie BOGO, discount, informational
  • difficulty (int) - minimum required spend to complete an offer
  • reward (int) - reward given for completing an offer
  • duration (int) - time for offer to be open, in days
  • channels (list of strings)

profile.json

  • age (int) - age of the customer
  • became_member_on (int) - date when customer created an app account
  • gender (str) - gender of the customer (note some entries contain 'O' for other rather than M or F)
  • id (str) - customer id
  • income (float) - customer's income

transcript.json

  • event (str) - record description (ie transaction, offer received, offer viewed, etc.)
  • person (str) - customer id
  • time (int) - time in hours since start of test. The data begins at time t=0
  • value - (dict of strings) - either an offer id or transaction amount depending on the record
In [1]:
import sys

!"{sys.executable}" -m pip install https://github.com/pandas-profiling/pandas-profiling/archive/refs/tags/v3.0.0.zip
!jupyter nbextension enable --py widgetsnbextension
!"{sys.executable}" -m pip install panel
!pip install sagemaker==1.72.0
Collecting https://github.com/pandas-profiling/pandas-profiling/archive/refs/tags/v3.0.0.zip
  Using cached https://github.com/pandas-profiling/pandas-profiling/archive/refs/tags/v3.0.0.zip
  Preparing metadata (setup.py) ... done
Requirement already satisfied: joblib in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (1.0.1)
Requirement already satisfied: scipy>=1.4.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (1.5.3)
Requirement already satisfied: pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (1.1.5)
Requirement already satisfied: matplotlib>=3.2.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (3.3.4)
Requirement already satisfied: pydantic>=1.8.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (1.8.2)
Requirement already satisfied: PyYAML>=5.0.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (5.4.1)
Requirement already satisfied: jinja2>=2.11.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (3.0.1)
Requirement already satisfied: visions[type_image_path]==0.7.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (0.7.1)
Requirement already satisfied: numpy>=1.16.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (1.19.5)
Requirement already satisfied: htmlmin>=0.1.12 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (0.1.12)
Requirement already satisfied: missingno>=0.4.2 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (0.5.0)
Requirement already satisfied: phik>=0.11.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (0.12.0)
Requirement already satisfied: tangled-up-in-unicode==0.1.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (0.1.0)
Requirement already satisfied: requests>=2.24.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (2.25.1)
Requirement already satisfied: tqdm>=4.48.2 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (4.61.1)
Requirement already satisfied: seaborn>=0.10.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas-profiling==3.0.0) (0.11.2)
Requirement already satisfied: networkx>=2.4 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from visions[type_image_path]==0.7.1->pandas-profiling==3.0.0) (2.5)
Requirement already satisfied: multimethod==1.4 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from visions[type_image_path]==0.7.1->pandas-profiling==3.0.0) (1.4)
Requirement already satisfied: bottleneck in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from visions[type_image_path]==0.7.1->pandas-profiling==3.0.0) (1.3.2)
Requirement already satisfied: attrs>=19.3.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from visions[type_image_path]==0.7.1->pandas-profiling==3.0.0) (21.2.0)
Requirement already satisfied: Pillow in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from visions[type_image_path]==0.7.1->pandas-profiling==3.0.0) (8.4.0)
Requirement already satisfied: imagehash in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from visions[type_image_path]==0.7.1->pandas-profiling==3.0.0) (4.2.1)
Requirement already satisfied: MarkupSafe>=2.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from jinja2>=2.11.1->pandas-profiling==3.0.0) (2.0.1)
Requirement already satisfied: python-dateutil>=2.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from matplotlib>=3.2.0->pandas-profiling==3.0.0) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from matplotlib>=3.2.0->pandas-profiling==3.0.0) (1.3.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from matplotlib>=3.2.0->pandas-profiling==3.0.0) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/cycler-0.10.0-py3.6.egg (from matplotlib>=3.2.0->pandas-profiling==3.0.0) (0.10.0)
Requirement already satisfied: pytz>=2017.2 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling==3.0.0) (2021.1)
Requirement already satisfied: dataclasses>=0.6 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pydantic>=1.8.1->pandas-profiling==3.0.0) (0.8)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from pydantic>=1.8.1->pandas-profiling==3.0.0) (3.10.0.0)
Requirement already satisfied: idna<3,>=2.5 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from requests>=2.24.0->pandas-profiling==3.0.0) (2.10)
Requirement already satisfied: chardet<5,>=3.0.2 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from requests>=2.24.0->pandas-profiling==3.0.0) (4.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from requests>=2.24.0->pandas-profiling==3.0.0) (2021.5.30)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from requests>=2.24.0->pandas-profiling==3.0.0) (1.26.5)
Requirement already satisfied: six in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from cycler>=0.10->matplotlib>=3.2.0->pandas-profiling==3.0.0) (1.16.0)
Requirement already satisfied: decorator>=4.3.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from networkx>=2.4->visions[type_image_path]==0.7.1->pandas-profiling==3.0.0) (5.0.9)
Requirement already satisfied: PyWavelets in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from imagehash->visions[type_image_path]==0.7.1->pandas-profiling==3.0.0) (1.1.1)
Config option `kernel_spec_manager_class` not recognized by `EnableNBExtensionApp`.
Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: OK
Requirement already satisfied: panel in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (0.12.1)
Requirement already satisfied: pyviz-comms>=0.7.4 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from panel) (2.1.0)
Requirement already satisfied: markdown in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from panel) (3.3.6)
Requirement already satisfied: tqdm>=4.48.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from panel) (4.61.1)
Requirement already satisfied: pyct>=0.4.4 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from panel) (0.4.8)
Requirement already satisfied: bokeh<2.4.0,>=2.3.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from panel) (2.3.3)
Requirement already satisfied: requests in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from panel) (2.25.1)
Requirement already satisfied: param>=1.10.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from panel) (1.12.0)
Requirement already satisfied: bleach in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from panel) (4.1.0)
Requirement already satisfied: pillow>=7.1.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from bokeh<2.4.0,>=2.3.0->panel) (8.4.0)
Requirement already satisfied: PyYAML>=3.10 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from bokeh<2.4.0,>=2.3.0->panel) (5.4.1)
Requirement already satisfied: tornado>=5.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from bokeh<2.4.0,>=2.3.0->panel) (6.1)
Requirement already satisfied: typing-extensions>=3.7.4 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from bokeh<2.4.0,>=2.3.0->panel) (3.10.0.0)
Requirement already satisfied: numpy>=1.11.3 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from bokeh<2.4.0,>=2.3.0->panel) (1.19.5)
Requirement already satisfied: packaging>=16.8 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from bokeh<2.4.0,>=2.3.0->panel) (21.3)
Requirement already satisfied: python-dateutil>=2.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from bokeh<2.4.0,>=2.3.0->panel) (2.8.1)
Requirement already satisfied: Jinja2>=2.9 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from bokeh<2.4.0,>=2.3.0->panel) (3.0.1)
Requirement already satisfied: webencodings in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from bleach->panel) (0.5.1)
Requirement already satisfied: six>=1.9.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from bleach->panel) (1.16.0)
Requirement already satisfied: importlib-metadata>=4.4 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from markdown->panel) (4.5.0)
Requirement already satisfied: chardet<5,>=3.0.2 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from requests->panel) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from requests->panel) (1.26.5)
Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from requests->panel) (2021.5.30)
Requirement already satisfied: idna<3,>=2.5 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from requests->panel) (2.10)
Requirement already satisfied: zipp>=0.5 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from importlib-metadata>=4.4->markdown->panel) (3.4.1)
Requirement already satisfied: MarkupSafe>=2.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from Jinja2>=2.9->bokeh<2.4.0,>=2.3.0->panel) (2.0.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from packaging>=16.8->bokeh<2.4.0,>=2.3.0->panel) (2.4.7)
Requirement already satisfied: sagemaker==1.72.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (1.72.0)
Requirement already satisfied: protobuf3-to-dict>=0.1.5 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from sagemaker==1.72.0) (0.1.5)
Requirement already satisfied: boto3>=1.14.12 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from sagemaker==1.72.0) (1.20.25)
Requirement already satisfied: scipy>=0.19.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from sagemaker==1.72.0) (1.5.3)
Requirement already satisfied: packaging>=20.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from sagemaker==1.72.0) (21.3)
Requirement already satisfied: smdebug-rulesconfig==0.1.4 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from sagemaker==1.72.0) (0.1.4)
Requirement already satisfied: importlib-metadata>=1.4.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from sagemaker==1.72.0) (4.5.0)
Requirement already satisfied: numpy>=1.9.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from sagemaker==1.72.0) (1.19.5)
Requirement already satisfied: protobuf>=3.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from sagemaker==1.72.0) (3.17.2)
Requirement already satisfied: botocore<1.24.0,>=1.23.25 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from boto3>=1.14.12->sagemaker==1.72.0) (1.23.25)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from boto3>=1.14.12->sagemaker==1.72.0) (0.10.0)
Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from boto3>=1.14.12->sagemaker==1.72.0) (0.5.0)
Requirement already satisfied: zipp>=0.5 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from importlib-metadata>=1.4.0->sagemaker==1.72.0) (3.4.1)
Requirement already satisfied: typing-extensions>=3.6.4 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from importlib-metadata>=1.4.0->sagemaker==1.72.0) (3.10.0.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from packaging>=20.0->sagemaker==1.72.0) (2.4.7)
Requirement already satisfied: six>=1.9 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from protobuf>=3.1->sagemaker==1.72.0) (1.16.0)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from botocore<1.24.0,>=1.23.25->boto3>=1.14.12->sagemaker==1.72.0) (1.26.5)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages (from botocore<1.24.0,>=1.23.25->boto3>=1.14.12->sagemaker==1.72.0) (2.8.1)

0. Load all required packages and setup some settings

In [2]:
%load_ext autoreload
%autoreload 2

# Our package
from pandas_profiling import ProfileReport
from pandas_profiling.utils.cache import cache_file
import pandas as pd
import numpy as np
import math
import json
import pandas
from pandas_profiling import ProfileReport
from sklearn.preprocessing import LabelBinarizer, MultiLabelBinarizer, MinMaxScaler
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import os

# pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 150)

1. Data Loading and Exploratory Data Analysis (EDA)

1.1 Portfolio Dataset

In [3]:
# read in the json files
portfolio = pd.read_json('data/portfolio.json', orient='records', lines=True)
In [97]:
portfolio
Out[97]:
reward channels difficulty duration offer_type id
0 10 [email, mobile, social] 10 7 bogo ae264e3637204a6fb9bb56bc8210ddfd
1 10 [web, email, mobile, social] 10 5 bogo 4d5c57ea9a6940dd891ad53e9dbe8da0
2 0 [web, email, mobile] 0 4 informational 3f207df678b143eea3cee63160fa8bed
3 5 [web, email, mobile] 5 7 bogo 9b98b8c7a33c4b65b9aebfe6a799e6d9
4 5 [web, email] 20 10 discount 0b1e1539f2cc45b7b9fa7c272da2e1d7
5 3 [web, email, mobile, social] 7 7 discount 2298d6c36e964ae4a3e7e9706d1fb8c2
6 2 [web, email, mobile, social] 10 10 discount fafdcd668e3743c1bb461111dcafc2a4
7 0 [email, mobile, social] 0 3 informational 5a8bc65990b245e5a138643cd4eb9837
8 5 [web, email, mobile, social] 5 5 bogo f19421c1d4aa40978ebb69ca19b0e20d
9 2 [web, email, mobile] 10 7 discount 2906b810c7d4411798c6938adc9daaa5
In [5]:
print (f"portfolio: -> {portfolio.shape[0]} rows \n {' '*8}  ->  {portfolio.shape[1]} columns")
portfolio: -> 10 rows 
           ->  6 columns
In [6]:
# creating a more representative column for the channels column, only for the eda part
portfolio['channels_eda'] = portfolio['channels'].apply(lambda x: ' '.join(x))
portfolio
Out[6]:
reward channels difficulty duration offer_type id channels_eda
0 10 [email, mobile, social] 10 7 bogo ae264e3637204a6fb9bb56bc8210ddfd email mobile social
1 10 [web, email, mobile, social] 10 5 bogo 4d5c57ea9a6940dd891ad53e9dbe8da0 web email mobile social
2 0 [web, email, mobile] 0 4 informational 3f207df678b143eea3cee63160fa8bed web email mobile
3 5 [web, email, mobile] 5 7 bogo 9b98b8c7a33c4b65b9aebfe6a799e6d9 web email mobile
4 5 [web, email] 20 10 discount 0b1e1539f2cc45b7b9fa7c272da2e1d7 web email
5 3 [web, email, mobile, social] 7 7 discount 2298d6c36e964ae4a3e7e9706d1fb8c2 web email mobile social
6 2 [web, email, mobile, social] 10 10 discount fafdcd668e3743c1bb461111dcafc2a4 web email mobile social
7 0 [email, mobile, social] 0 3 informational 5a8bc65990b245e5a138643cd4eb9837 email mobile social
8 5 [web, email, mobile, social] 5 5 bogo f19421c1d4aa40978ebb69ca19b0e20d web email mobile social
9 2 [web, email, mobile] 10 7 discount 2906b810c7d4411798c6938adc9daaa5 web email mobile
In [7]:
display(portfolio.describe(include='all'))
print ()
display(portfolio.info())
reward channels difficulty duration offer_type id channels_eda
count 10.000000 10 10.000000 10.000000 10 10 10
unique NaN 4 NaN NaN 3 10 4
top NaN [web, email, mobile, social] NaN NaN bogo 3f207df678b143eea3cee63160fa8bed web email mobile social
freq NaN 4 NaN NaN 4 1 4
mean 4.200000 NaN 7.700000 6.500000 NaN NaN NaN
std 3.583915 NaN 5.831905 2.321398 NaN NaN NaN
min 0.000000 NaN 0.000000 3.000000 NaN NaN NaN
25% 2.000000 NaN 5.000000 5.000000 NaN NaN NaN
50% 4.000000 NaN 8.500000 7.000000 NaN NaN NaN
75% 5.000000 NaN 10.000000 7.000000 NaN NaN NaN
max 10.000000 NaN 20.000000 10.000000 NaN NaN NaN
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   reward        10 non-null     int64 
 1   channels      10 non-null     object
 2   difficulty    10 non-null     int64 
 3   duration      10 non-null     int64 
 4   offer_type    10 non-null     object
 5   id            10 non-null     object
 6   channels_eda  10 non-null     object
dtypes: int64(3), object(4)
memory usage: 688.0+ bytes
None

HTML REPORTS CANNOT BE DISPLAYED IN JUPYTER NOTEBOOK. IF YOU WANT TO SEE THE BELOW REPORT HAVE A LOOK IN THE CORRESPONDING HTML DOCUMENT OR IN THE HTML VERSION OF THIS NOTEBOOK

In [98]:
portfolio_report = ProfileReport(
    portfolio.loc[:, portfolio.columns != 'id']
    , title="Portfolio Exploratory Data Analysis Report"
    , html={"style": {"full_width": True}}
    , explorative=True
    , sort=None
)
portfolio_report.to_file("portfolio_report.html")

portfolio_report
/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pandas/core/frame.py:4308: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
Out[98]:

In [9]:
# remove channels_eda column which has been created only for the eda part
portfolio = portfolio.drop(['channels_eda'], axis=1)

1.2 Profile Dataset

In [10]:
# read in the json files
profile = pd.read_json('data/profile.json', orient='records', lines=True)
In [11]:
profile
Out[11]:
gender age id became_member_on income
0 None 118 68be06ca386d4c31939f3a4f0e3dd783 20170212 NaN
1 F 55 0610b486422d4921ae7d2bf64640c50b 20170715 112000.0
2 None 118 38fe809add3b4fcf9315a9694bb96ff5 20180712 NaN
3 F 75 78afa995795e4d85b5d9ceeca43f5fef 20170509 100000.0
4 None 118 a03223e636434f42ac4c3df47e8bac43 20170804 NaN
... ... ... ... ... ...
16995 F 45 6d5f3a774f3d4714ab0c092238f3a1d7 20180604 54000.0
16996 M 61 2cb4f97358b841b9a9773a7aa05a9d77 20180713 72000.0
16997 M 49 01d26f638c274aa0b965d24cefe3183f 20170126 73000.0
16998 F 83 9dc1421481194dcd9400aec7c9ae6366 20160307 50000.0
16999 F 62 e4052622e5ba45a8b96b59aba68cf068 20170722 82000.0

17000 rows × 5 columns

In [12]:
print (f"profile: -> {profile.shape[0]} rows \n {' '*6}  ->      {profile.shape[1]} columns")
profile: -> 17000 rows 
         ->      5 columns
In [13]:
profile.dtypes
Out[13]:
gender               object
age                   int64
id                   object
became_member_on      int64
income              float64
dtype: object
In [14]:
# change became_member_on to date format
profile["became_member_on"] = profile["became_member_on"].apply(lambda x: pd.to_datetime(str(x)))
display(profile.dtypes)
print()
profile
gender                      object
age                          int64
id                          object
became_member_on    datetime64[ns]
income                     float64
dtype: object

Out[14]:
gender age id became_member_on income
0 None 118 68be06ca386d4c31939f3a4f0e3dd783 2017-02-12 NaN
1 F 55 0610b486422d4921ae7d2bf64640c50b 2017-07-15 112000.0
2 None 118 38fe809add3b4fcf9315a9694bb96ff5 2018-07-12 NaN
3 F 75 78afa995795e4d85b5d9ceeca43f5fef 2017-05-09 100000.0
4 None 118 a03223e636434f42ac4c3df47e8bac43 2017-08-04 NaN
... ... ... ... ... ...
16995 F 45 6d5f3a774f3d4714ab0c092238f3a1d7 2018-06-04 54000.0
16996 M 61 2cb4f97358b841b9a9773a7aa05a9d77 2018-07-13 72000.0
16997 M 49 01d26f638c274aa0b965d24cefe3183f 2017-01-26 73000.0
16998 F 83 9dc1421481194dcd9400aec7c9ae6366 2016-03-07 50000.0
16999 F 62 e4052622e5ba45a8b96b59aba68cf068 2017-07-22 82000.0

17000 rows × 5 columns

In [15]:
display(profile.describe(include='all'))
print ()
display(profile.info())
/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ipykernel/__main__.py:1: FutureWarning: Treating datetime data as categorical rather than numeric in `.describe` is deprecated and will be removed in a future version of pandas. Specify `datetime_is_numeric=True` to silence this warning and adopt the future behavior now.
  if __name__ == '__main__':
gender age id became_member_on income
count 14825 17000.000000 17000 17000 14825.000000
unique 3 NaN 17000 1716 NaN
top M NaN dd9594e6ca65431db74456ae27a298e4 2017-12-07 00:00:00 NaN
freq 8484 NaN 1 43 NaN
first NaN NaN NaN 2013-07-29 00:00:00 NaN
last NaN NaN NaN 2018-07-26 00:00:00 NaN
mean NaN 62.531412 NaN NaN 65404.991568
std NaN 26.738580 NaN NaN 21598.299410
min NaN 18.000000 NaN NaN 30000.000000
25% NaN 45.000000 NaN NaN 49000.000000
50% NaN 58.000000 NaN NaN 64000.000000
75% NaN 73.000000 NaN NaN 80000.000000
max NaN 118.000000 NaN NaN 120000.000000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17000 entries, 0 to 16999
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   gender            14825 non-null  object        
 1   age               17000 non-null  int64         
 2   id                17000 non-null  object        
 3   became_member_on  17000 non-null  datetime64[ns]
 4   income            14825 non-null  float64       
dtypes: datetime64[ns](1), float64(1), int64(1), object(2)
memory usage: 664.2+ KB
None

HTML REPORTS CANNOT BE DISPLAYED IN JUPYTER NOTEBOOK. IF YOU WANT TO SEE THE BELOW REPORT HAVE A LOOK IN THE CORRESPONDING HTML DOCUMENT OR IN THE HTML VERSION OF THIS NOTEBOOK

In [99]:
profile_report = ProfileReport(profile.loc[:, profile.columns != 'id'], title="Profile Exploratory Data Analysis Report", explorative=True)
profile_report.to_file("profile_report.html")

profile_report
/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pandas/core/frame.py:4308: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
Out[99]:

1.3 Transcript Dataset

In [17]:
# read in the json files
transcript = pd.read_json('data/transcript.json', orient='records', lines=True)
In [18]:
transcript
Out[18]:
person event value time
0 78afa995795e4d85b5d9ceeca43f5fef offer received {'offer id': '9b98b8c7a33c4b65b9aebfe6a799e6d9'} 0
1 a03223e636434f42ac4c3df47e8bac43 offer received {'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'} 0
2 e2127556f4f64592b11af22de27a7932 offer received {'offer id': '2906b810c7d4411798c6938adc9daaa5'} 0
3 8ec6ce2a7e7949b1bf142def7d0e0586 offer received {'offer id': 'fafdcd668e3743c1bb461111dcafc2a4'} 0
4 68617ca6246f4fbc85e91a2a49552598 offer received {'offer id': '4d5c57ea9a6940dd891ad53e9dbe8da0'} 0
... ... ... ... ...
306529 b3a1272bc9904337b331bf348c3e8c17 transaction {'amount': 1.5899999999999999} 714
306530 68213b08d99a4ae1b0dcb72aebd9aa35 transaction {'amount': 9.53} 714
306531 a00058cf10334a308c68e7631c529907 transaction {'amount': 3.61} 714
306532 76ddbd6576844afe811f1a3c0fbb5bec transaction {'amount': 3.5300000000000002} 714
306533 c02b10e8752c4d8e9b73f918558531f7 transaction {'amount': 4.05} 714

306534 rows × 4 columns

In [19]:
print (f"transcript: -> {transcript.shape[0]} rows \n {' '*9}  ->      {transcript.shape[1]} columns")
transcript: -> 306534 rows 
            ->      4 columns
In [20]:
transcript['value_eda'] = transcript.value.apply(lambda x: json.dumps(x).replace(x['offer id'], portfolio.offer_type[portfolio.id==x['offer id']].values[0]) if 'offer id' in x.keys() else json.dumps(x))
transcript
Out[20]:
person event value time value_eda
0 78afa995795e4d85b5d9ceeca43f5fef offer received {'offer id': '9b98b8c7a33c4b65b9aebfe6a799e6d9'} 0 {"offer id": "bogo"}
1 a03223e636434f42ac4c3df47e8bac43 offer received {'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'} 0 {"offer id": "discount"}
2 e2127556f4f64592b11af22de27a7932 offer received {'offer id': '2906b810c7d4411798c6938adc9daaa5'} 0 {"offer id": "discount"}
3 8ec6ce2a7e7949b1bf142def7d0e0586 offer received {'offer id': 'fafdcd668e3743c1bb461111dcafc2a4'} 0 {"offer id": "discount"}
4 68617ca6246f4fbc85e91a2a49552598 offer received {'offer id': '4d5c57ea9a6940dd891ad53e9dbe8da0'} 0 {"offer id": "bogo"}
... ... ... ... ... ...
306529 b3a1272bc9904337b331bf348c3e8c17 transaction {'amount': 1.5899999999999999} 714 {"amount": 1.5899999999999999}
306530 68213b08d99a4ae1b0dcb72aebd9aa35 transaction {'amount': 9.53} 714 {"amount": 9.53}
306531 a00058cf10334a308c68e7631c529907 transaction {'amount': 3.61} 714 {"amount": 3.61}
306532 76ddbd6576844afe811f1a3c0fbb5bec transaction {'amount': 3.5300000000000002} 714 {"amount": 3.5300000000000002}
306533 c02b10e8752c4d8e9b73f918558531f7 transaction {'amount': 4.05} 714 {"amount": 4.05}

306534 rows × 5 columns

In [21]:
display(transcript.describe(include='all'))
print ()
display(transcript.info())
person event value time value_eda
count 306534 306534 306534 306534.000000 306534
unique 17000 4 5121 NaN 5114
top 94de646f7b6041228ca7dec82adb97d2 transaction {'offer id': '2298d6c36e964ae4a3e7e9706d1fb8c2'} NaN {"offer id": "bogo"}
freq 51 138953 14983 NaN 55948
mean NaN NaN NaN 366.382940 NaN
std NaN NaN NaN 200.326314 NaN
min NaN NaN NaN 0.000000 NaN
25% NaN NaN NaN 186.000000 NaN
50% NaN NaN NaN 408.000000 NaN
75% NaN NaN NaN 528.000000 NaN
max NaN NaN NaN 714.000000 NaN
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 306534 entries, 0 to 306533
Data columns (total 5 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   person     306534 non-null  object
 1   event      306534 non-null  object
 2   value      306534 non-null  object
 3   time       306534 non-null  int64 
 4   value_eda  306534 non-null  object
dtypes: int64(1), object(4)
memory usage: 11.7+ MB
None

HTML REPORTS CANNOT BE DISPLAYED IN JUPYTER NOTEBOOK. IF YOU WANT TO SEE THE BELOW REPORT HAVE A LOOK IN THE CORRESPONDING HTML DOCUMENT OR IN THE HTML VERSION OF THIS NOTEBOOK

In [100]:
transcript_report = ProfileReport(transcript.loc[:, transcript.columns != 'value'], title="Transcript Exploratory Data Analysis Report", explorative=True)
transcript_report.to_file("transcript_report.html")

transcript_report
/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pandas/core/frame.py:4308: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
Out[100]:

2. Data Cleaning / Transformation

2.1 Cleaning & Transforming Portfolio Dataset

Looking at the exploratory data analysis of the portfolio dataset at the section 1.1:

  • No actual need of cleaning the values of any of the columns (there are no null values or any outliers)
  • Need to rename the existing column names to more representative feature names which will be used later on in the final dataframe
    • id -> offer_id
    • reward -> offer_reward
    • duration -> offer_duration_days
    • difficulty -> offer_difficulty
  • One hot encode the channels columns
  • Remove the offer_type and channels columns and replace them with their corresponding one hot encoded values
In [23]:
def clean_transform_portfolio(portfolio):
    """
    Function to clean and transform the portfolio  
    dataframe based on above requirements    
    
    Parameters 
        portfolio: portfolio dataframe
    
    Returns
        portfolio_cleaned: the cleaned portfolio dataframe
    """
    
    # rename columns
    portfolio = portfolio.rename(columns = {'id':'offer_id', 
                                            'reward':'offer_reward', 
                                            'duration':'offer_duration_days', 
                                            'difficulty':'offer_difficulty'})
    
    # one hot encode channels
    multi_label_binarizer = MultiLabelBinarizer()
    multi_label_binarizer.fit(portfolio['channels'])
    
    channels_encoded = pd.DataFrame(
        multi_label_binarizer.transform(portfolio['channels']), 
        columns=multi_label_binarizer.classes_)
    
    # remove offer_type and channels columns
    portfolio = portfolio.drop(['channels'], axis=1) 
    
    # create final portfolio_cleaned dataframe by adding the encoded columns
    portfolio_cleaned = pd.concat([portfolio, channels_encoded], axis=1) 
    
    # put all columns in a more representative order
    columns_order = ['offer_id', 'offer_type', 'offer_duration_days', 'offer_difficulty',
                     'offer_reward', 'email', 'mobile', 'social', 'web']
    
    return portfolio_cleaned[columns_order]
In [24]:
portfolio_cleaned = clean_transform_portfolio(portfolio)
portfolio_cleaned
Out[24]:
offer_id offer_type offer_duration_days offer_difficulty offer_reward email mobile social web
0 ae264e3637204a6fb9bb56bc8210ddfd bogo 7 10 10 1 1 1 0
1 4d5c57ea9a6940dd891ad53e9dbe8da0 bogo 5 10 10 1 1 1 1
2 3f207df678b143eea3cee63160fa8bed informational 4 0 0 1 1 0 1
3 9b98b8c7a33c4b65b9aebfe6a799e6d9 bogo 7 5 5 1 1 0 1
4 0b1e1539f2cc45b7b9fa7c272da2e1d7 discount 10 20 5 1 0 0 1
5 2298d6c36e964ae4a3e7e9706d1fb8c2 discount 7 7 3 1 1 1 1
6 fafdcd668e3743c1bb461111dcafc2a4 discount 10 10 2 1 1 1 1
7 5a8bc65990b245e5a138643cd4eb9837 informational 3 0 0 1 1 1 0
8 f19421c1d4aa40978ebb69ca19b0e20d bogo 5 5 5 1 1 1 1
9 2906b810c7d4411798c6938adc9daaa5 discount 7 10 2 1 1 0 1

2.2 Cleaning & Transforming Profile Dataset

Looking at the exploratory data analysis of the profile dataset at the section 1.2:

  • There are no extreme values for any of the columns. However, there are approximately 12.8% of missing values which should be removed
  • Extract only the year information from the became_member_on column
  • Need to rename the existing column names to more representative feature names which will be used later on in the final dataframe
    • id -> customer_id
    • age -> customer_age
    • became_member_on -> customer_registration_year
    • income -> customer_income
In [25]:
print ('Any NULL value on any row in the profile dataset is equal to:', round((1- (profile.dropna().shape[0]/profile.shape[0])), 3)*100, '%')
print ('Number of customers with 118 age and missing values in general', round(profile['income'][profile.age==118].shape[0]/profile.shape[0], 3)*100, '%')
Any NULL value on any row in the profile dataset is equal to: 12.8 %
Number of customers with 118 age and missing values in general 12.8 %
In [26]:
def clean_transform_profile(profile):
    """
    Function to clean and transform the profile  
    dataframe based on above requirements    
    
    Parameters 
        profile: profile dataframe
    
    Returns
        profile_cleaned: the cleaned profile dataframe
    """
    
    # remove any customer with missing value (12.8% of all customers have a missing value and age assigned as 118)
    profile_cleaned = profile.dropna().reset_index()
    
    # keep only the year that each customer became member
    profile_cleaned['became_member_on'] = pd.DatetimeIndex(profile_cleaned.became_member_on).year
    
    # rename columns
    profile_cleaned = profile_cleaned.rename(columns = {'became_member_on':'customer_registration_year',
                                                        'income':'customer_income',
                                                        'id':'customer_id', 
                                                        'age':'customer_age',
                                                        'gender':'customer_gender'})
    
    # put all columns in a more representative order
    columns_order = ['customer_id', 'customer_age', 'customer_gender', 
                     'customer_income', 'customer_registration_year']
    
    return profile_cleaned[columns_order]
In [27]:
profile_cleaned = clean_transform_profile(profile)
profile_cleaned
Out[27]:
customer_id customer_age customer_gender customer_income customer_registration_year
0 0610b486422d4921ae7d2bf64640c50b 55 F 112000.0 2017
1 78afa995795e4d85b5d9ceeca43f5fef 75 F 100000.0 2017
2 e2127556f4f64592b11af22de27a7932 68 M 70000.0 2018
3 389bc3fa690240e798340f5a15918d5c 65 M 53000.0 2018
4 2eeac8d8feae4a8cad5a6af0499a211d 58 M 51000.0 2017
... ... ... ... ... ...
14820 6d5f3a774f3d4714ab0c092238f3a1d7 45 F 54000.0 2018
14821 2cb4f97358b841b9a9773a7aa05a9d77 61 M 72000.0 2018
14822 01d26f638c274aa0b965d24cefe3183f 49 M 73000.0 2017
14823 9dc1421481194dcd9400aec7c9ae6366 83 F 50000.0 2016
14824 e4052622e5ba45a8b96b59aba68cf068 62 F 82000.0 2017

14825 rows × 5 columns

2.3 Cleaning & Transforming Transcript Dataset

2.3.1 Cleaning Transcript Dataset

Looking at the exploratory data analysis of the profile dataset at the section 1.3:

  • No actual need of cleaning the values of any of the columns (there are no null values or any outliers)
  • Need to remove value_eda created in step 1.3
  • Need to rename key values in value column from offer id to offer_id for any instance that this is true
  • Need to remove any customer that is not appear in the profile dataset
  • Need to rename the existing column names to more representative feature names which will be used later on in the final dataframe
    • person'-> customer_id
    • time -> time_hours
In [28]:
def clean_transcript(transcript, profile):
    """
    Function to clean  the transcript dataframe  
    based on above requirements    
    
    Parameters 
        transcript: transcript dataframe
        profile: cleaned profile dataframe
    
    Returns
        transcript_cleaned: the cleaned transcript dataframe
    """
    
    # remove value_eda column
    transcript = transcript.drop(['value_eda'], axis=1)
    
    # rename offer id to offer_id if applicable 
    transcript['value'] = transcript.value.apply(lambda x: {'offer_id': x['offer id']} if 'offer id' in x.keys() and 'reward' not in x.keys() else x)   
    
    # remove any customer that is not appear in the profile dataset
    all_customers_list = list(profile['customer_id'])
    transcript = transcript[transcript.person.apply(lambda x: x in all_customers_list)].reset_index()
    
    # rename columns
    transcript_cleaned = transcript.rename(columns = {'person':'customer_id',
                                                      'time':'time_hours',})

    # put all columns in a more representative order
    columns_order = ['customer_id', 'event', 'value', 'time_hours']
    
    return transcript_cleaned[columns_order]
In [29]:
transcript_cleaned = clean_transcript(transcript, profile_cleaned)
transcript_cleaned
Out[29]:
customer_id event value time_hours
0 78afa995795e4d85b5d9ceeca43f5fef offer received {'offer_id': '9b98b8c7a33c4b65b9aebfe6a799e6d9'} 0
1 e2127556f4f64592b11af22de27a7932 offer received {'offer_id': '2906b810c7d4411798c6938adc9daaa5'} 0
2 389bc3fa690240e798340f5a15918d5c offer received {'offer_id': 'f19421c1d4aa40978ebb69ca19b0e20d'} 0
3 2eeac8d8feae4a8cad5a6af0499a211d offer received {'offer_id': '3f207df678b143eea3cee63160fa8bed'} 0
4 aa4862eba776480b8bb9c68455b8c2e1 offer received {'offer_id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'} 0
... ... ... ... ...
272757 24f56b5e1849462093931b164eb803b5 offer completed {'offer_id': 'fafdcd668e3743c1bb461111dcafc2a4', 'reward': 2} 714
272758 b3a1272bc9904337b331bf348c3e8c17 transaction {'amount': 1.5899999999999999} 714
272759 68213b08d99a4ae1b0dcb72aebd9aa35 transaction {'amount': 9.53} 714
272760 a00058cf10334a308c68e7631c529907 transaction {'amount': 3.61} 714
272761 76ddbd6576844afe811f1a3c0fbb5bec transaction {'amount': 3.5300000000000002} 714

272762 rows × 4 columns

2.3.2 Transforming Transcript Dataset

This is a really important step as the label of our dataset will be created.
The label will be named successful_offer and it will be a binary feature with values 0 or 1. These values will represent if an offer was a successful offer, from the company's perspective, for this customer or not.

An successful offer is the offer which follows the below steps:

  • Offer Received -> Offer Viewed -> Offer Completed
In [30]:
def offer_summary(transcript, portfolio):
    """
    Function to create a summary of all offers for each customer which will present  
    which offer was successful for each customer based on the above logic    
    
    Parameters 
        transcript: cleaned transcript dataframe
        portfolio: cleaned portfolio dataframe
    
    Returns
        offer_summary: a dataframe with customer_id, offer_id and corresponding label
    """
    
    offer_received = transcript_cleaned.query("event=='offer received'").reset_index(drop=True)
    offer_received.value = offer_received.value.apply(lambda x: x['offer_id'])

    offer_viewed = transcript_cleaned.query("event=='offer viewed'").reset_index(drop=True)
    offer_viewed.value = offer_viewed.value.apply(lambda x: x['offer_id'])

    offer_completed = transcript_cleaned.query("event=='offer completed'").reset_index(drop=True)
    offer_completed.value = offer_completed.value.apply(lambda x: x['offer_id'])
    
    offer_summary_list = list()
    for index, received_row in offer_received.iterrows():
        received_customer_id = received_row.customer_id
        received_offer_id = received_row.value
        received_time_hours = received_row.time_hours
        offer_duration_hours = (portfolio_cleaned.query(f"offer_id=='{received_offer_id}'").offer_duration_days).values[0]*24

        viewed_df = offer_viewed.query(f"time_hours>={received_time_hours} & time_hours<={received_time_hours+offer_duration_hours} & value=='{received_offer_id}' & customer_id=='{received_customer_id}'")

        if not viewed_df.empty:
            viewed_time_hours = viewed_df.iloc[0].time_hours

            completed_df = offer_completed.query(f"time_hours>={viewed_time_hours} & time_hours<={received_time_hours+offer_duration_hours} & value=='{received_offer_id}' & customer_id=='{received_customer_id}'")
            if not completed_df.empty:            
                successful_offer = 1
        else:
            successful_offer = 0

        offer_summary_list.append([received_customer_id, received_offer_id, successful_offer])

    return pd.DataFrame(offer_summary_list, columns=['customer_id', 'offer_id', 'successful_offer'])
In [96]:
if os.path.isfile('offer_summary.csv'):
    offer_summary = pd.read_csv('offer_summary.csv')
else:
    offer_summary = offer_summary(transcript_cleaned, portfolio_cleaned)
    offer_summary.to_csv('offer_summary.csv', index=False)
    
offer_summary
Out[96]:
customer_id offer_id successful_offer
0 78afa995795e4d85b5d9ceeca43f5fef 9b98b8c7a33c4b65b9aebfe6a799e6d9 1
1 e2127556f4f64592b11af22de27a7932 2906b810c7d4411798c6938adc9daaa5 1
2 389bc3fa690240e798340f5a15918d5c f19421c1d4aa40978ebb69ca19b0e20d 1
3 2eeac8d8feae4a8cad5a6af0499a211d 3f207df678b143eea3cee63160fa8bed 0
4 aa4862eba776480b8bb9c68455b8c2e1 0b1e1539f2cc45b7b9fa7c272da2e1d7 0
... ... ... ...
66496 d087c473b4d247ccb0abfef59ba12b0e ae264e3637204a6fb9bb56bc8210ddfd 1
66497 cb23b66c56f64b109d673d5e56574529 2906b810c7d4411798c6938adc9daaa5 0
66498 6d5f3a774f3d4714ab0c092238f3a1d7 2298d6c36e964ae4a3e7e9706d1fb8c2 0
66499 9dc1421481194dcd9400aec7c9ae6366 ae264e3637204a6fb9bb56bc8210ddfd 0
66500 e4052622e5ba45a8b96b59aba68cf068 3f207df678b143eea3cee63160fa8bed 0

66501 rows × 3 columns

2.4 Data Merge / Final Dataframe

In [33]:
def data_merge(portfolio, profile, offer_summary):
    """
    Creation of the final table by merging
    the final cleaned data frames
       
    Parameters
    ---------- 
    portfolio : cleaned and transformed portfolio data frame
    profile : cleaned and transformed profile data frame
    offer_summary : final offer_summary table as defined above
      
    Returns
    -------
    merged_df: merged data frame to be used in our problem
    
    """
    
    merged_df = pd.merge(offer_summary, profile, on='customer_id')
    merged_df = pd.merge(merged_df, portfolio, on='offer_id')
    
    columns_order = ['customer_age', 'customer_gender', 'customer_income', 'customer_registration_year',
                     'offer_type', 'offer_duration_days', 'offer_difficulty', 'offer_reward',
                     'email', 'mobile', 'social', 'web', 'successful_offer']
    
    return merged_df[columns_order]
In [34]:
merged_df = data_merge(portfolio_cleaned, profile_cleaned, offer_summary)
merged_df
Out[34]:
customer_age customer_gender customer_income customer_registration_year offer_type offer_duration_days offer_difficulty offer_reward email mobile social web successful_offer
0 75 F 100000.0 2017 bogo 7 5 5 1 1 0 1 1
1 68 M 70000.0 2018 bogo 7 5 5 1 1 0 1 1
2 65 M 53000.0 2018 bogo 7 5 5 1 1 0 1 1
3 65 M 53000.0 2018 bogo 7 5 5 1 1 0 1 1
4 56 F 88000.0 2018 bogo 7 5 5 1 1 0 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ...
66496 48 M 58000.0 2018 bogo 5 10 10 1 1 1 1 1
66497 44 F 81000.0 2016 bogo 5 10 10 1 1 1 1 1
66498 47 M 94000.0 2017 bogo 5 10 10 1 1 1 1 0
66499 61 F 60000.0 2014 bogo 5 10 10 1 1 1 1 1
66500 58 F 78000.0 2016 bogo 5 10 10 1 1 1 1 1

66501 rows × 13 columns

3. EDA & Feature Engineering on Final Table

3.1 Pandas Profiling Report (EDA)

In [35]:
display(merged_df.describe(include='all'))
print ()
display(merged_df.info())
customer_age customer_gender customer_income customer_registration_year offer_type offer_duration_days offer_difficulty offer_reward email mobile social web successful_offer
count 66501.000000 66501 66501.000000 66501.000000 66501 66501.000000 66501.00000 66501.000000 66501.0 66501.000000 66501.000000 66501.000000 66501.000000
unique NaN 3 NaN NaN 3 NaN NaN NaN NaN NaN NaN NaN NaN
top NaN M NaN NaN discount NaN NaN NaN NaN NaN NaN NaN NaN
freq NaN 38129 NaN NaN 26664 NaN NaN NaN NaN NaN NaN NaN NaN
mean 54.369258 NaN 65371.618472 2016.622021 NaN 6.507571 7.71417 4.198824 1.0 0.898859 0.598517 0.799612 0.567119
std 17.395430 NaN 21623.288473 1.198364 NaN 2.204416 5.54754 3.398100 0.0 0.301518 0.490202 0.400294 0.495478
min 18.000000 NaN 30000.000000 2013.000000 NaN 3.000000 0.00000 0.000000 1.0 0.000000 0.000000 0.000000 0.000000
25% 42.000000 NaN 49000.000000 2016.000000 NaN 5.000000 5.00000 2.000000 1.0 1.000000 0.000000 1.000000 0.000000
50% 55.000000 NaN 64000.000000 2017.000000 NaN 7.000000 10.00000 5.000000 1.0 1.000000 1.000000 1.000000 1.000000
75% 66.000000 NaN 80000.000000 2017.000000 NaN 7.000000 10.00000 5.000000 1.0 1.000000 1.000000 1.000000 1.000000
max 101.000000 NaN 120000.000000 2018.000000 NaN 10.000000 20.00000 10.000000 1.0 1.000000 1.000000 1.000000 1.000000
<class 'pandas.core.frame.DataFrame'>
Int64Index: 66501 entries, 0 to 66500
Data columns (total 13 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   customer_age                66501 non-null  int64  
 1   customer_gender             66501 non-null  object 
 2   customer_income             66501 non-null  float64
 3   customer_registration_year  66501 non-null  int64  
 4   offer_type                  66501 non-null  object 
 5   offer_duration_days         66501 non-null  int64  
 6   offer_difficulty            66501 non-null  int64  
 7   offer_reward                66501 non-null  int64  
 8   email                       66501 non-null  int64  
 9   mobile                      66501 non-null  int64  
 10  social                      66501 non-null  int64  
 11  web                         66501 non-null  int64  
 12  successful_offer            66501 non-null  int64  
dtypes: float64(1), int64(10), object(2)
memory usage: 7.1+ MB
None

HTML REPORTS CANNOT BE DISPLAYED IN JUPYTER NOTEBOOK. IF YOU WANT TO SEE THE BELOW REPORT HAVE A LOOK IN THE CORRESPONDING HTML DOCUMENT OR IN THE HTML VERSION OF THIS NOTEBOOK

In [101]:
merged_df_report = ProfileReport(merged_df, title="Final Table Exploratory Data Analysis Report", explorative=True)
merged_df_report.to_file("merged_df_report.html")

merged_df_report
/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pandas_profiling/model/correlations.py:210: RuntimeWarning: invalid value encountered in greater_equal
  bool_index = abs(correlation_matrix.values) >= threshold
Out[101]:

3.2 Manual EDA

In [37]:
merged_df
Out[37]:
customer_age customer_gender customer_income customer_registration_year offer_type offer_duration_days offer_difficulty offer_reward email mobile social web successful_offer
0 75 F 100000.0 2017 bogo 7 5 5 1 1 0 1 1
1 68 M 70000.0 2018 bogo 7 5 5 1 1 0 1 1
2 65 M 53000.0 2018 bogo 7 5 5 1 1 0 1 1
3 65 M 53000.0 2018 bogo 7 5 5 1 1 0 1 1
4 56 F 88000.0 2018 bogo 7 5 5 1 1 0 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ...
66496 48 M 58000.0 2018 bogo 5 10 10 1 1 1 1 1
66497 44 F 81000.0 2016 bogo 5 10 10 1 1 1 1 1
66498 47 M 94000.0 2017 bogo 5 10 10 1 1 1 1 0
66499 61 F 60000.0 2014 bogo 5 10 10 1 1 1 1 1
66500 58 F 78000.0 2016 bogo 5 10 10 1 1 1 1 1

66501 rows × 13 columns

Customer Age & Customer Income

In [38]:
sns.pairplot(merged_df[['customer_age', 'customer_income', 'successful_offer']], hue='successful_offer', height=5)
plt.show()

Gender

In [39]:
plt.figure(figsize=(12, 4))
sns.countplot(hue="successful_offer", x= "customer_gender", data=merged_df)
plt.show()

Offer Type

In [40]:
plt.figure(figsize=(12, 4))
sns.countplot(hue="successful_offer", x= "offer_type", data=merged_df)
plt.show()

Offer Duration

In [41]:
plt.figure(figsize=(12, 4))
sns.countplot(hue="successful_offer", x= "offer_duration_days", data=merged_df)
plt.show()

Offer Difficulty

In [42]:
plt.figure(figsize=(12, 4))
sns.countplot(hue="successful_offer", x= "offer_difficulty", data=merged_df)
plt.show()

Offer Reward

In [43]:
plt.figure(figsize=(12, 4))
sns.countplot(hue="successful_offer", x= "offer_reward", data=merged_df)
plt.show()

Registration Year

In [44]:
plt.figure(figsize=(12, 4))
sns.countplot(hue="successful_offer", x= "customer_registration_year", data=merged_df)
plt.show()

3.3 Feature Cleaning & Engineering based on above results

Looking at the exploratory data analysis of the final dataset above:

  • Categorical features:
    • One hot encode offer_type column
    • One hot encode customer_gender column
    • One hot encode customer_registration_year column
  • Numerical features:

    • Scale and normalize customer_age column
    • Scale and normalize customer_income column
    • Scale and normalize offer_duration_days column
    • Scale and normalize offer_difficulty column
    • Scale and normalize offer_reward column
  • Gender O has less than 2% of data and as a result this column should be dropped because it does not bring any value to the final model

  • Channel email appears to all offers, so 100% of its values are constant and as a result this column should be dropped
  • Gender M is highly correlated with gender F, so one of the two has to be dropped. This is because highly correlated features do not work well in machine learning models and may negatively influence the performance of them
  • There are some duplicate rows and should be dropped
In [45]:
def transform_engineer_merged_df(merged_df):
    """
    Function to clean and transform the final   
    dataframe based on above requirements    
    
    Parameters 
        merged_df: the final merged dataframe
        
    Returns
        final_df: the transformed transcript dataframe
    """

    # one hot encode offer_type
    offer_type_encoded = pd.get_dummies(merged_df['offer_type'])
    
    # one hot encode offer_type
    customer_gender_encoded = pd.get_dummies(merged_df['customer_gender'])

    # one hot encode customer_registration_year
    customer_registration_year_encoded = pd.get_dummies(merged_df['customer_registration_year'])
    
    # create final profile_cleaned dataframe by adding the encoded columns
    final_df = pd.concat([merged_df, offer_type_encoded, customer_gender_encoded, customer_registration_year_encoded], axis=1)

    # scale numerical features
    scaler = MinMaxScaler() 
    numerical_features = ['customer_age', 'customer_income', 'offer_duration_days', 'offer_difficulty', 'offer_reward']
    final_df[numerical_features] = scaler.fit_transform(final_df[numerical_features])
    
    # remove the columns based on above requirements and the encoded columns
    final_df = final_df.drop(['offer_type', 'customer_gender', 'customer_registration_year', 'O', 'email', 'F'], axis=1)
    
    # drop duplicates
    final_df = final_df.drop_duplicates().reset_index(drop=True)

    # order the resulted columns
    columns_order = ['customer_age', 'M', 'customer_income'] + list(customer_registration_year_encoded.columns) + \
                    ['bogo', 'discount', 'informational', 'offer_duration_days', 'offer_difficulty', 'offer_reward',
                     'mobile', 'social', 'web', 'successful_offer']
    
    return final_df[columns_order]
In [46]:
final_df = transform_engineer_merged_df(merged_df)
final_df
Out[46]:
customer_age M customer_income 2013 2014 2015 2016 2017 2018 bogo discount informational offer_duration_days offer_difficulty offer_reward mobile social web successful_offer
0 0.686747 0 0.777778 0 0 0 0 1 0 1 0 0 0.571429 0.25 0.5 1 0 1 1
1 0.602410 1 0.444444 0 0 0 0 0 1 1 0 0 0.571429 0.25 0.5 1 0 1 1
2 0.566265 1 0.255556 0 0 0 0 0 1 1 0 0 0.571429 0.25 0.5 1 0 1 1
3 0.457831 0 0.644444 0 0 0 0 0 1 1 0 0 0.571429 0.25 0.5 1 0 1 1
4 0.457831 0 0.644444 0 0 0 0 0 1 1 0 0 0.571429 0.25 0.5 1 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
55447 0.361446 1 0.311111 0 0 0 0 0 1 1 0 0 0.285714 0.50 1.0 1 1 1 1
55448 0.313253 0 0.566667 0 0 0 1 0 0 1 0 0 0.285714 0.50 1.0 1 1 1 1
55449 0.349398 1 0.711111 0 0 0 0 1 0 1 0 0 0.285714 0.50 1.0 1 1 1 0
55450 0.518072 0 0.333333 0 1 0 0 0 0 1 0 0 0.285714 0.50 1.0 1 1 1 1
55451 0.481928 0 0.533333 0 0 0 1 0 0 1 0 0 0.285714 0.50 1.0 1 1 1 1

55452 rows × 19 columns

4. Prepare and split the data in training, validation and test sets

On this step we will split the data into three different sets (training, validation & test sets)

  • The rules are the below:
    • 30% of the data to be left on the side for the testing set
    • No balance on the testing set
    • Balance the data in both training and validation sets (equal number of successful and unseccessful offers per offer type)
In [47]:
class data_split():
    """
    Class to split the data in training and test set
    with the option to balance the dataset as well
    
    Parameters 
        df: the final merged dataframe
        label: the target label
        
    Returns
        X_train, X_test, y_train, y_test: the training and test sets
    """
    
    def __init__(self, df, label):
        self.X = df.loc[:, df.columns != label]
        self.y = df[label]
    
    def balance_data(self, X, y):
        
        X_bogo = X.query('bogo==1')
        X_discount  = X.query('discount==1')
        X_informational = X.query('informational==1')
        
        index_0 = list()
        index_1 = list()
        for temp_X in [X_bogo, X_discount, X_informational]:
            temp_y = y.loc[list(temp_X.index)]
            
            count_0, count_1 = temp_y.value_counts()[0], temp_y.value_counts()[1]

            if count_1>count_0:
                index_0.append(list(temp_y[temp_y==0].index))
                index_1.append(list(temp_y[temp_y==1].sample(n=count_0, random_state=42).index))
            else:
                index_1.append(list(temp_y[temp_y==1].index))
                index_0.append(list(temp_y[temp_y==0].sample(n=count_1, random_state=42).index))

        X = X.loc[[j for sub in index_0+index_1 for j in sub]]
        y = y.loc[[j for sub in index_0+index_1 for j in sub]]
        
        return X, y
    
    def train_test_val_split(self):
        
        X_train, X_test, y_train, y_test = train_test_split(self.X, self.y, test_size=0.3, random_state=42)
        X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.3, random_state=42)

        X_train, y_train = self.balance_data(X_train, y_train)
        X_val, y_val = self.balance_data(X_val, y_val)

        return X_train, X_val, X_test, y_train, y_val, y_test
In [48]:
ds = data_split(final_df, 'successful_offer')

X_train, X_val, X_test, y_train, y_val, y_test = ds.train_test_val_split()
X_train.shape, X_val.shape, X_test.shape, y_train.shape, y_val.shape, y_test.shape
Out[48]:
((21178, 18), (9116, 18), (16636, 18), (21178,), (9116,), (16636,))

5. Benchmark Model

5.1 Decision Trees using all offers together

In [49]:
from sklearn import tree

clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)
In [50]:
y_pred = clf.predict(X_test)
accuracy_score(y_test, y_pred)
Out[50]:
0.5384106756431835
In [51]:
clf.predict_proba(X_test)
Out[51]:
array([[0., 1.],
       [0., 1.],
       [1., 0.],
       ...,
       [0., 1.],
       [0., 1.],
       [0., 1.]])

5.2 Decision Trees using only bogo

Prepare the training & test sets

In [52]:
X_train_bogo = X_train.query("bogo==1").drop(columns=['bogo', 'discount', 'informational'])
y_train_bogo = y_train.loc[X_train_bogo.index]

X_test_bogo = X_test.query("bogo==1").drop(columns=['bogo', 'discount', 'informational'])
y_test_bogo = y_test.loc[X_test_bogo.index]

X_train_bogo.shape, y_train_bogo.shape, X_test_bogo.shape, y_test_bogo.shape
Out[52]:
((7942, 15), (7942,), (6654, 15), (6654,))
In [53]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train_bogo, y_train_bogo)
In [54]:
y_pred = clf.predict(X_test_bogo)
accuracy_score(y_test_bogo, y_pred)
Out[54]:
0.5108205590622182

Prepare validation sets for XGBoost

In [55]:
X_val_bogo = X_val.query("bogo==1").drop(columns=['bogo', 'discount', 'informational'])
y_val_bogo = y_val.loc[X_val_bogo.index]

X_val_bogo.shape, y_val_bogo.shape
Out[55]:
((3464, 15), (3464,))

5.2 Decision Trees using only discount

Prepare the training & test sets

In [56]:
X_train_discount = X_train.query("discount==1").drop(columns=['bogo', 'discount', 'informational'])
y_train_discount = y_train.loc[X_train_discount.index]

X_test_discount = X_test.query("discount==1").drop(columns=['bogo', 'discount', 'informational'])
y_test_discount = y_test.loc[X_test_discount.index]

X_train_discount.shape, y_train_discount.shape, X_test_discount.shape, y_test_discount.shape
Out[56]:
((8938, 15), (8938,), (6568, 15), (6568,))
In [57]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train_discount, y_train.loc[X_train_discount.index])
In [58]:
y_pred = clf.predict(X_test_discount)
accuracy_score(y_test.loc[X_test_discount.index], y_pred)
Out[58]:
0.584652862362972

Prepare validation sets for XGBoost

In [59]:
X_val_discount = X_val.query("discount==1").drop(columns=['bogo', 'discount', 'informational'])
y_val_discount = y_val.loc[X_val_discount.index]

X_val_discount.shape, y_val_discount.shape
Out[59]:
((3926, 15), (3926,))

5.2 Decision Trees using only informational

Prepare the training & test sets

In [60]:
X_train_informational = X_train.query("informational==1").drop(columns=['bogo', 'discount', 'informational'])
y_train_informational = y_train.loc[X_train_informational.index]

X_test_informational = X_test.query("informational==1").drop(columns=['bogo', 'discount', 'informational'])
y_test_informational = y_test.loc[X_test_informational.index]

X_train_informational.shape, y_train_informational.shape, X_test_informational.shape, y_test_informational.shape
Out[60]:
((4298, 15), (4298,), (3414, 15), (3414,))
In [61]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train_informational, y_train.loc[X_train_informational.index])
In [62]:
y_pred = clf.predict(X_test_informational)
accuracy_score(y_test.loc[X_test_informational.index], y_pred)
Out[62]:
0.4961921499707089

Prepare validation sets for XGBoost

In [63]:
X_val_informational = X_val.query("informational==1").drop(columns=['bogo', 'discount', 'informational'])
y_val_informational = y_val.loc[X_val_informational.index]

X_val_informational.shape, y_val_informational.shape
Out[63]:
((1726, 15), (1726,))

6. XGBoost Modelling

XGBoost models will be created and used on AWS

6.1 Define session

In [64]:
import sagemaker
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri
from sagemaker.predictor import csv_serializer

# this is an object that represents the SageMaker session that we are currently operating in. This
# object contains some useful information that we will need to access later such as our region.
session = sagemaker.Session()

# this is an object that represents the IAM role that we are currently assigned. When we construct
# and launch the training job later we will need to tell it what IAM role it should have. Since our
# use case is relatively simple we will simply assign the training job the role we currently have.
role = get_execution_role()

6.2: Uploading the data files to S3

When a training job is constructed using SageMaker, a container is executed which performs the training operation. This container is given access to data that is stored in S3. This means that we need to upload the data we want to use for training to S3. In addition, when we perform a batch transform job, SageMaker expects the input data to be stored on S3. We can use the SageMaker API to do this and hide some of the details.

Save the data locally

First we need to create the test, train and validation csv files which we will then upload to S3.

In [65]:
# this is our local data directory. We need to make sure that it exists.
data_dir = '../Starbucks-Capstone-Project/data'
if not os.path.exists(data_dir):
    os.makedirs(data_dir)
In [66]:
# we use pandas to save our test, train and validation data to csv files. Note that we make sure not to include header
# information or an index as this is required by the built in algorithms provided by Amazon. Also, for the train and
# validation data, it is assumed that the first entry in each row is the target variable.

X_test.to_csv(os.path.join(data_dir, 'all_offers', 'test.csv'), header=False, index=False)
pd.concat([y_val, X_val], axis=1).to_csv(os.path.join(data_dir, 'all_offers', 'validation.csv'), header=False, index=False)
pd.concat([y_train, X_train], axis=1).to_csv(os.path.join(data_dir, 'all_offers', 'train.csv'), header=False, index=False)

X_test_bogo.to_csv(os.path.join(data_dir, 'bogo', 'test_bogo.csv'), header=False, index=False)
pd.concat([y_val_bogo, X_val_bogo], axis=1).to_csv(os.path.join(data_dir, 'bogo', 'validation_bogo.csv'), header=False, index=False)
pd.concat([y_train_bogo, X_train_bogo], axis=1).to_csv(os.path.join(data_dir, 'bogo', 'train_bogo.csv'), header=False, index=False)

X_test_discount.to_csv(os.path.join(data_dir, 'discount', 'test_discount.csv'), header=False, index=False)
pd.concat([y_val_discount, X_val_discount], axis=1).to_csv(os.path.join(data_dir, 'discount', 'validation_discount.csv'), header=False, index=False)
pd.concat([y_train_discount, X_train_discount], axis=1).to_csv(os.path.join(data_dir, 'discount', 'train_discount.csv'), header=False, index=False)

X_test_informational.to_csv(os.path.join(data_dir, 'informational', 'test_informational.csv'), header=False, index=False)
pd.concat([y_val_informational, X_val_informational], axis=1).to_csv(os.path.join(data_dir, 'informational', 'validation_informational.csv'), header=False, index=False)
pd.concat([y_train_informational, X_train_informational], axis=1).to_csv(os.path.join(data_dir, 'informational', 'train_informational.csv'), header=False, index=False)

Upload to S3

Since we are currently running inside of a SageMaker session, we can use the object which represents this session to upload our data to the 'default' S3 bucket. Note that it is good practice to provide a custom prefix (essentially an S3 folder) to make sure that you don't accidentally interfere with data uploaded from some other notebook or project.

In [67]:
prefix = 'starbucks-xgboost'

test_location = session.upload_data(os.path.join(data_dir, 'all_offers', 'test.csv'), key_prefix=prefix)
val_location = session.upload_data(os.path.join(data_dir, 'all_offers', 'validation.csv'), key_prefix=prefix)
train_location = session.upload_data(os.path.join(data_dir, 'all_offers', 'train.csv'), key_prefix=prefix)

test_location_bogo = session.upload_data(os.path.join(data_dir, 'bogo', 'test_bogo.csv'), key_prefix=prefix)
val_location_bogo = session.upload_data(os.path.join(data_dir, 'bogo', 'validation_bogo.csv'), key_prefix=prefix)
train_location_bogo = session.upload_data(os.path.join(data_dir, 'bogo', 'train_bogo.csv'), key_prefix=prefix)

test_location_discount = session.upload_data(os.path.join(data_dir, 'discount', 'test_discount.csv'), key_prefix=prefix)
val_location_discount = session.upload_data(os.path.join(data_dir, 'discount', 'validation_discount.csv'), key_prefix=prefix)
train_location_discount = session.upload_data(os.path.join(data_dir, 'discount', 'train_discount.csv'), key_prefix=prefix)

test_location_informational = session.upload_data(os.path.join(data_dir, 'informational', 'test_informational.csv'), key_prefix=prefix)
val_location_informational = session.upload_data(os.path.join(data_dir, 'informational', 'validation_informational.csv'), key_prefix=prefix)
train_location_informational = session.upload_data(os.path.join(data_dir, 'informational', 'train_informational.csv'), key_prefix=prefix)

6.3 Train and test the XGBoost models

Now that we have the training and validation data uploaded to S3, we can construct our XGBoost models and train them. Instead of training a single model, we will use SageMaker's hyperparameter tuning functionality to train multiple models and use the one that performs the best on the validation set.

To begin with, we will need to construct an estimator object.

6.3.1 Training & Testing using all offers together

In [68]:
# as stated above, we use this utility method to construct the image name for the training container.
container = get_image_uri(session.boto_region_name, 'xgboost')

# now that we know which container to use, we can construct the estimator object.
xgb = sagemaker.estimator.Estimator(container, # The name of the training container
                                    role,      # The IAM role to use (our current role in this case)
                                    train_instance_count=1, # The number of instances to use for training
                                    train_instance_type='ml.m4.xlarge', # The type of instance ot use for training
                                    output_path='s3://{}/{}/output'.format(session.default_bucket(), prefix),
                                                                        # Where to save the output (the model artifacts)
                                    sagemaker_session=session) # The current SageMaker session
'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
There is a more up to date SageMaker XGBoost image. To use the newer image, please set 'repo_version'='1.0-1'. For example:
	get_image_uri(region, 'xgboost', '1.0-1').
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.

Before beginning the hyperparameter tuning, we should make sure to set any model specific hyperparameters that we wish to have default values. There are quite a few that can be set when using the XGBoost algorithm, below are just a few of them. If you would like to change the hyperparameters below or modify additional ones you can find additional information on the XGBoost hyperparameter page

In [69]:
xgb.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        objective='binary:logistic',
                        early_stopping_rounds=10,
                        num_round=200)

Now that we have our estimator object completely set up, it is time to create the hyperparameter tuner. To do this we need to construct a new object which contains each of the parameters we want SageMaker to tune. In this case, we wish to find the best values for the max_depth, eta, min_child_weight, subsample, and gamma parameters. Note that for each parameter that we want SageMaker to tune we need to specify both the type of the parameter and the range of values that parameter may take on.

In addition, we specify the number of models to construct (max_jobs) and the number of those that can be trained in parallel (max_parallel_jobs). In the cell below we have chosen to train 20 models, of which we ask that SageMaker train 3 at a time in parallel. Note that this results in a total of 20 training jobs being executed which can take some time, in this case almost a half hour. With more complicated models this can take even longer so be aware!

In [70]:
from sagemaker.tuner import IntegerParameter, ContinuousParameter, HyperparameterTuner

xgb_hyperparameter_tuner = HyperparameterTuner(estimator = xgb, # The estimator object to use as the basis for the training jobs.
                                               objective_metric_name = 'validation:rmse', # The metric used to compare trained models.
                                               objective_type = 'Minimize', # Whether we wish to minimize or maximize the metric.
                                               max_jobs = 20, # The total number of models to train
                                               max_parallel_jobs = 3, # The number of models to train in parallel
                                               hyperparameter_ranges = {
                                                    'max_depth': IntegerParameter(3, 12),
                                                    'eta'      : ContinuousParameter(0.05, 0.5),
                                                    'min_child_weight': IntegerParameter(2, 8),
                                                    'subsample': ContinuousParameter(0.5, 0.9),
                                                    'gamma': ContinuousParameter(0, 10),
                                               })

Now that we have our hyperparameter tuner object completely set up, it is time to train it. To do this we make sure that SageMaker knows our input data is in csv format and then execute the fit method.

In [71]:
# this is a wrapper around the location of our train and validation data, to make sure that SageMaker
# knows our data is in csv format.
s3_input_train = sagemaker.s3_input(s3_data=train_location, content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data=val_location, content_type='csv')

xgb_hyperparameter_tuner.fit({'train': s3_input_train, 'validation': s3_input_validation})
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.

As in many of the examples we have seen so far, the fit() method takes care of setting up and fitting a number of different models, each with different hyperparameters. If we wish to wait for this process to finish, we can call the wait() method.

In [138]:
xgb_hyperparameter_tuner.wait()
..................................................................................................................................................................................................................................................................................................................................................................................!

Once the hyperamater tuner has finished, we can retrieve information about the best performing model.

In [139]:
xgb_hyperparameter_tuner.best_training_job()
Out[139]:
'xgboost-220212-2140-009-fbc78020'

In addition, since we'd like to set up a batch transform job to test the best model, we can construct a new estimator object from the results of the best training job. The xgb_attached object below can now be used as though we constructed an estimator with the best performing hyperparameters and then fit it to our training data.

In [70]:
xgb_attached = sagemaker.estimator.Estimator.attach(xgb_hyperparameter_tuner.best_training_job())
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
2022-02-12 21:52:27 Starting - Preparing the instances for training
2022-02-12 21:52:27 Downloading - Downloading input data
2022-02-12 21:52:27 Training - Training image download completed. Training in progress.
2022-02-12 21:52:27 Uploading - Uploading generated training model
2022-02-12 21:52:27 Completed - Training job completedArguments: train
[2022-02-12:21:52:13:INFO] Running standalone xgboost training.
[2022-02-12:21:52:13:INFO] Setting up HPO optimized metric to be : rmse
[2022-02-12:21:52:13:INFO] File size need to be processed in the node: 2.74mb. Available memory size in the node: 8527.95mb
[2022-02-12:21:52:13:INFO] Determined delimiter of CSV input is ','
[21:52:13] S3DistributionType set as FullyReplicated
[21:52:13] 21178x18 matrix with 381204 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,
[2022-02-12:21:52:13:INFO] Determined delimiter of CSV input is ','
[21:52:13] S3DistributionType set as FullyReplicated
[21:52:13] 9116x18 matrix with 164088 entries loaded from /opt/ml/input/data/validation?format=csv&label_column=0&delimiter=,
[21:52:13] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 26 extra nodes, 86 pruned nodes, max_depth=6
[0]#011train-rmse:0.48829#011validation-rmse:0.487431
Multiple eval metrics have been passed: 'validation-rmse' will be used for early stopping.
Will train until validation-rmse hasn't improved in 10 rounds.
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 28 extra nodes, 76 pruned nodes, max_depth=6
[1]#011train-rmse:0.480162#011validation-rmse:0.478911
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 26 extra nodes, 84 pruned nodes, max_depth=6
[2]#011train-rmse:0.474606#011validation-rmse:0.472981
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 32 extra nodes, 72 pruned nodes, max_depth=6
[3]#011train-rmse:0.470662#011validation-rmse:0.468871
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 80 pruned nodes, max_depth=6
[4]#011train-rmse:0.467823#011validation-rmse:0.465742
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 30 extra nodes, 84 pruned nodes, max_depth=6
[5]#011train-rmse:0.465834#011validation-rmse:0.463617
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 78 pruned nodes, max_depth=6
[6]#011train-rmse:0.464402#011validation-rmse:0.462107
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 22 extra nodes, 86 pruned nodes, max_depth=6
[7]#011train-rmse:0.463363#011validation-rmse:0.46097
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 106 pruned nodes, max_depth=5
[8]#011train-rmse:0.462522#011validation-rmse:0.460013
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 16 extra nodes, 84 pruned nodes, max_depth=6
[9]#011train-rmse:0.461991#011validation-rmse:0.459447
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 100 pruned nodes, max_depth=3
[10]#011train-rmse:0.46173#011validation-rmse:0.459133
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 94 pruned nodes, max_depth=5
[11]#011train-rmse:0.461124#011validation-rmse:0.458789
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 84 pruned nodes, max_depth=6
[12]#011train-rmse:0.460818#011validation-rmse:0.458498
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 80 pruned nodes, max_depth=2
[13]#011train-rmse:0.4607#011validation-rmse:0.458267
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 90 pruned nodes, max_depth=3
[14]#011train-rmse:0.460544#011validation-rmse:0.458086
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 82 pruned nodes, max_depth=1
[15]#011train-rmse:0.460498#011validation-rmse:0.458039
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 84 pruned nodes, max_depth=2
[16]#011train-rmse:0.460408#011validation-rmse:0.457968
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 104 pruned nodes, max_depth=5
[17]#011train-rmse:0.460172#011validation-rmse:0.457709
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 112 pruned nodes, max_depth=0
[18]#011train-rmse:0.460172#011validation-rmse:0.45771
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 98 pruned nodes, max_depth=0
[19]#011train-rmse:0.460172#011validation-rmse:0.457718
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 98 pruned nodes, max_depth=6
[20]#011train-rmse:0.459961#011validation-rmse:0.457584
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 64 pruned nodes, max_depth=5
[21]#011train-rmse:0.459768#011validation-rmse:0.45744
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 58 pruned nodes, max_depth=6
[22]#011train-rmse:0.459624#011validation-rmse:0.457336
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 90 pruned nodes, max_depth=4
[23]#011train-rmse:0.459511#011validation-rmse:0.457329
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 92 pruned nodes, max_depth=0
[24]#011train-rmse:0.45951#011validation-rmse:0.457322
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 82 pruned nodes, max_depth=0
[25]#011train-rmse:0.459511#011validation-rmse:0.457331
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 64 pruned nodes, max_depth=0
[26]#011train-rmse:0.45951#011validation-rmse:0.457326
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 84 pruned nodes, max_depth=3
[27]#011train-rmse:0.459441#011validation-rmse:0.457193
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 98 pruned nodes, max_depth=0
[28]#011train-rmse:0.459443#011validation-rmse:0.457199
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 66 pruned nodes, max_depth=3
[29]#011train-rmse:0.459305#011validation-rmse:0.457167
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 94 pruned nodes, max_depth=5
[30]#011train-rmse:0.459201#011validation-rmse:0.457008
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 98 pruned nodes, max_depth=5
[31]#011train-rmse:0.459097#011validation-rmse:0.456994
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 68 pruned nodes, max_depth=0
[32]#011train-rmse:0.459097#011validation-rmse:0.456988
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 62 pruned nodes, max_depth=6
[33]#011train-rmse:0.458913#011validation-rmse:0.456973
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 92 pruned nodes, max_depth=0
[34]#011train-rmse:0.458913#011validation-rmse:0.456972
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 76 pruned nodes, max_depth=6
[35]#011train-rmse:0.458831#011validation-rmse:0.456969
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 114 pruned nodes, max_depth=0
[36]#011train-rmse:0.458831#011validation-rmse:0.456966
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 68 pruned nodes, max_depth=0
[37]#011train-rmse:0.458832#011validation-rmse:0.456961
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 106 pruned nodes, max_depth=0
[38]#011train-rmse:0.458834#011validation-rmse:0.456958
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 112 pruned nodes, max_depth=0
[39]#011train-rmse:0.458831#011validation-rmse:0.456963
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 44 pruned nodes, max_depth=0
[40]#011train-rmse:0.458831#011validation-rmse:0.456966
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 72 pruned nodes, max_depth=5
[41]#011train-rmse:0.458723#011validation-rmse:0.456765
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 88 pruned nodes, max_depth=0
[42]#011train-rmse:0.458723#011validation-rmse:0.456765
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 60 pruned nodes, max_depth=6
[43]#011train-rmse:0.458545#011validation-rmse:0.456733
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 102 pruned nodes, max_depth=3
[44]#011train-rmse:0.458493#011validation-rmse:0.456742
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 110 pruned nodes, max_depth=0
[45]#011train-rmse:0.458494#011validation-rmse:0.456738
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 80 pruned nodes, max_depth=0
[46]#011train-rmse:0.458494#011validation-rmse:0.456737
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 60 pruned nodes, max_depth=0
[47]#011train-rmse:0.458495#011validation-rmse:0.456735
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 90 pruned nodes, max_depth=0
[48]#011train-rmse:0.458493#011validation-rmse:0.456738
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 54 pruned nodes, max_depth=0
[49]#011train-rmse:0.458493#011validation-rmse:0.456738
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 110 pruned nodes, max_depth=0
[50]#011train-rmse:0.458494#011validation-rmse:0.456737
[21:52:15] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 100 pruned nodes, max_depth=0
[51]#011train-rmse:0.458494#011validation-rmse:0.456736
[21:52:15] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 52 pruned nodes, max_depth=3
[52]#011train-rmse:0.458454#011validation-rmse:0.456738
[21:52:15] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 100 pruned nodes, max_depth=0
[53]#011train-rmse:0.458455#011validation-rmse:0.456736
Stopping. Best iteration:
[43]#011train-rmse:0.458545#011validation-rmse:0.456733
Training seconds: 78
Billable seconds: 78

Testing the model

Now that we have our best performing model, we can test it. To do this we will use the batch transform functionality. To start with, we need to build a transformer object from our fit model.

In [71]:
xgb_transformer = xgb_attached.transformer(instance_count = 1, instance_type = 'ml.m4.xlarge')
Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
Using already existing model: xgboost-220212-2140-009-fbc78020

Next we ask SageMaker to begin a batch transform job using our trained model and applying it to the test data we previous stored in S3. We need to make sure to provide SageMaker with the type of data that we are providing to our model, in our case text/csv, so that it knows how to serialize our data. In addition, we need to make sure to let SageMaker know how to split our data up into chunks if the entire data set happens to be too large to send to our model all at once.

Note that when we ask SageMaker to do this it will execute the batch transform job in the background. Since we need to wait for the results of this job before we can continue, we use the wait() method. An added benefit of this is that we get some output from our batch transform job which lets us know if anything went wrong.

In [72]:
xgb_transformer.transform(test_location, content_type='text/csv', split_type='Line')

Currently the transform job is running but it is doing so in the background. Since we wish to wait until the transform job is done and we would like a bit of feedback we can run the wait() method.

In [73]:
xgb_transformer.wait()
................................Arguments: serve
[2022-02-14 11:09:37 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2022-02-14 11:09:37 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-02-14 11:09:37 +0000] [1] [INFO] Using worker: gevent
[2022-02-14 11:09:37 +0000] [21] [INFO] Booting worker with pid: 21
[2022-02-14 11:09:37 +0000] [22] [INFO] Booting worker with pid: 22
[2022-02-14 11:09:37 +0000] [23] [INFO] Booting worker with pid: 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:09:37:INFO] Model loaded successfully for worker : 21
[2022-02-14:11:09:37:INFO] Model loaded successfully for worker : 22
[2022-02-14 11:09:37 +0000] [24] [INFO] Booting worker with pid: 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:09:37:INFO] Model loaded successfully for worker : 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:09:37:INFO] Model loaded successfully for worker : 24
[2022-02-14:11:09:41:INFO] Sniff delimiter as ','
[2022-02-14:11:09:41:INFO] Determined delimiter of CSV input is ','

2022-02-14T11:09:41.402:[sagemaker logs]: MaxConcurrentTransforms=4, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD

Now the transform job has executed and the result, the estimated sentiment of each review, has been saved on S3. Since we would rather work on this file locally we can perform a bit of notebook magic to copy the file to the data_dir.

In [76]:
!aws s3 cp --recursive $xgb_transformer.output_path $data_dir'/all_offers'
download: s3://sagemaker-us-east-1-218287629635/xgboost-220212-2140-009-fbc78020-2022-02-14-11-04-24-226/test.csv.out to data/all_offers/test.csv.out

The last step is now to read in the output from our model, convert the output to something a little more usable, in this case we want the sentiment to be either 1 (positive) or 0 (negative), and then compare to the ground truth labels.

In [77]:
predictions = pd.read_csv(os.path.join(data_dir, 'all_offers', 'test.csv.out'), header=None)
y_pred = [round(num) for num in predictions.squeeze().values]

accuracy_score(y_test, y_pred)
Out[77]:
0.6659052656888675
In [78]:
predictions.values
Out[78]:
array([[0.64045531],
       [0.66430914],
       [0.67150784],
       ...,
       [0.69443542],
       [0.55591857],
       [0.7079705 ]])

6.3.2 Training & Testing using bogo offers only

Same steps as above

In [72]:
# as stated above, we use this utility method to construct the image name for the training container.
container = get_image_uri(session.boto_region_name, 'xgboost')

# now that we know which container to use, we can construct the estimator object.
xgb = sagemaker.estimator.Estimator(container, # The name of the training container
                                    role,      # The IAM role to use (our current role in this case)
                                    train_instance_count=1, # The number of instances to use for training
                                    train_instance_type='ml.m4.xlarge', # The type of instance ot use for training
                                    output_path='s3://{}/{}/output'.format(session.default_bucket(), prefix),
                                                                        # Where to save the output (the model artifacts)
                                    sagemaker_session=session) # The current SageMaker session
'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
There is a more up to date SageMaker XGBoost image. To use the newer image, please set 'repo_version'='1.0-1'. For example:
	get_image_uri(region, 'xgboost', '1.0-1').
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
In [73]:
xgb.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        objective='binary:logistic',
                        early_stopping_rounds=10,
                        num_round=200)
In [74]:
from sagemaker.tuner import IntegerParameter, ContinuousParameter, HyperparameterTuner

xgb_hyperparameter_tuner = HyperparameterTuner(estimator = xgb, # The estimator object to use as the basis for the training jobs.
                                               objective_metric_name = 'validation:rmse', # The metric used to compare trained models.
                                               objective_type = 'Minimize', # Whether we wish to minimize or maximize the metric.
                                               max_jobs = 20, # The total number of models to train
                                               max_parallel_jobs = 3, # The number of models to train in parallel
                                               hyperparameter_ranges = {
                                                    'max_depth': IntegerParameter(3, 12),
                                                    'eta'      : ContinuousParameter(0.05, 0.5),
                                                    'min_child_weight': IntegerParameter(2, 8),
                                                    'subsample': ContinuousParameter(0.5, 0.9),
                                                    'gamma': ContinuousParameter(0, 10),
                                               })
In [75]:
# This is a wrapper around the location of our train and validation data, to make sure that SageMaker
# knows our data is in csv format.
s3_input_train = sagemaker.s3_input(s3_data=train_location_bogo, content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data=val_location_bogo, content_type='csv')

xgb_hyperparameter_tuner.fit({'train': s3_input_train, 'validation': s3_input_validation})
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
In [153]:
xgb_hyperparameter_tuner.wait()
............................................................................................................................................................................................................................................................................................................................................................!
In [154]:
xgb_hyperparameter_tuner.best_training_job()
Out[154]:
'xgboost-220212-2217-010-2fb6076e'
In [79]:
xgb_attached = sagemaker.estimator.Estimator.attach(xgb_hyperparameter_tuner.best_training_job())
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
2022-02-12 22:32:33 Starting - Preparing the instances for training
2022-02-12 22:32:33 Downloading - Downloading input data
2022-02-12 22:32:33 Training - Training image download completed. Training in progress.
2022-02-12 22:32:33 Uploading - Uploading generated training model
2022-02-12 22:32:33 Completed - Training job completedArguments: train
[2022-02-12:22:32:20:INFO] Running standalone xgboost training.
[2022-02-12:22:32:20:INFO] Setting up HPO optimized metric to be : rmse
[2022-02-12:22:32:20:INFO] File size need to be processed in the node: 0.96mb. Available memory size in the node: 8375.31mb
[2022-02-12:22:32:20:INFO] Determined delimiter of CSV input is ','
[22:32:20] S3DistributionType set as FullyReplicated
[22:32:20] 7942x15 matrix with 119130 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,
[2022-02-12:22:32:20:INFO] Determined delimiter of CSV input is ','
[22:32:20] S3DistributionType set as FullyReplicated
[22:32:20] 3464x15 matrix with 51960 entries loaded from /opt/ml/input/data/validation?format=csv&label_column=0&delimiter=,
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 16 extra nodes, 72 pruned nodes, max_depth=4
[0]#011train-rmse:0.4935#011validation-rmse:0.493756
Multiple eval metrics have been passed: 'validation-rmse' will be used for early stopping.
Will train until validation-rmse hasn't improved in 10 rounds.
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 28 extra nodes, 60 pruned nodes, max_depth=6
[1]#011train-rmse:0.488017#011validation-rmse:0.489021
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 70 pruned nodes, max_depth=4
[2]#011train-rmse:0.484897#011validation-rmse:0.485948
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 66 pruned nodes, max_depth=6
[3]#011train-rmse:0.48252#011validation-rmse:0.483869
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 62 pruned nodes, max_depth=2
[4]#011train-rmse:0.481348#011validation-rmse:0.482688
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 84 pruned nodes, max_depth=5
[5]#011train-rmse:0.480112#011validation-rmse:0.481424
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 62 pruned nodes, max_depth=4
[6]#011train-rmse:0.479387#011validation-rmse:0.480599
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 66 pruned nodes, max_depth=2
[7]#011train-rmse:0.479029#011validation-rmse:0.48022
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 58 pruned nodes, max_depth=4
[8]#011train-rmse:0.478106#011validation-rmse:0.479349
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 76 pruned nodes, max_depth=2
[9]#011train-rmse:0.477848#011validation-rmse:0.479014
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 66 pruned nodes, max_depth=3
[10]#011train-rmse:0.477642#011validation-rmse:0.478897
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 58 pruned nodes, max_depth=2
[11]#011train-rmse:0.477382#011validation-rmse:0.478691
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 58 pruned nodes, max_depth=3
[12]#011train-rmse:0.477153#011validation-rmse:0.478852
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 60 pruned nodes, max_depth=5
[13]#011train-rmse:0.47662#011validation-rmse:0.47863
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 50 pruned nodes, max_depth=0
[14]#011train-rmse:0.47662#011validation-rmse:0.478632
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 50 pruned nodes, max_depth=4
[15]#011train-rmse:0.476378#011validation-rmse:0.478457
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 58 pruned nodes, max_depth=6
[16]#011train-rmse:0.475806#011validation-rmse:0.478195
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 58 pruned nodes, max_depth=0
[17]#011train-rmse:0.475807#011validation-rmse:0.478194
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 80 pruned nodes, max_depth=0
[18]#011train-rmse:0.47581#011validation-rmse:0.478192
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 70 pruned nodes, max_depth=0
[19]#011train-rmse:0.475809#011validation-rmse:0.478192
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 62 pruned nodes, max_depth=0
[20]#011train-rmse:0.475809#011validation-rmse:0.478192
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 82 pruned nodes, max_depth=0
[21]#011train-rmse:0.475806#011validation-rmse:0.478195
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 82 pruned nodes, max_depth=0
[22]#011train-rmse:0.475806#011validation-rmse:0.478197
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 56 pruned nodes, max_depth=0
[23]#011train-rmse:0.475806#011validation-rmse:0.478194
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 68 pruned nodes, max_depth=0
[24]#011train-rmse:0.475809#011validation-rmse:0.478192
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 96 pruned nodes, max_depth=0
[25]#011train-rmse:0.475807#011validation-rmse:0.478193
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 16 pruned nodes, max_depth=5
[26]#011train-rmse:0.475598#011validation-rmse:0.478206
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 32 pruned nodes, max_depth=6
[27]#011train-rmse:0.475279#011validation-rmse:0.478149
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 86 pruned nodes, max_depth=3
[28]#011train-rmse:0.475122#011validation-rmse:0.477953
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 66 pruned nodes, max_depth=0
[29]#011train-rmse:0.475128#011validation-rmse:0.477951
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 44 pruned nodes, max_depth=6
[30]#011train-rmse:0.474833#011validation-rmse:0.478008
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 100 pruned nodes, max_depth=4
[31]#011train-rmse:0.474681#011validation-rmse:0.477983
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 74 pruned nodes, max_depth=4
[32]#011train-rmse:0.474248#011validation-rmse:0.477832
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 54 pruned nodes, max_depth=0
[33]#011train-rmse:0.474249#011validation-rmse:0.477832
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 52 pruned nodes, max_depth=0
[34]#011train-rmse:0.474249#011validation-rmse:0.477832
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 60 pruned nodes, max_depth=0
[35]#011train-rmse:0.474246#011validation-rmse:0.477833
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 66 pruned nodes, max_depth=0
[36]#011train-rmse:0.474247#011validation-rmse:0.477833
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 62 pruned nodes, max_depth=0
[37]#011train-rmse:0.474245#011validation-rmse:0.477835
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 92 pruned nodes, max_depth=0
[38]#011train-rmse:0.474246#011validation-rmse:0.477842
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 46 pruned nodes, max_depth=0
[39]#011train-rmse:0.474246#011validation-rmse:0.477842
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 50 pruned nodes, max_depth=6
[40]#011train-rmse:0.473962#011validation-rmse:0.477898
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 62 pruned nodes, max_depth=0
[41]#011train-rmse:0.473956#011validation-rmse:0.4779
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 44 pruned nodes, max_depth=6
[42]#011train-rmse:0.473735#011validation-rmse:0.477887
Stopping. Best iteration:
[32]#011train-rmse:0.474248#011validation-rmse:0.477832
Training seconds: 62
Billable seconds: 62

Testing the model

In [80]:
xgb_transformer = xgb_attached.transformer(instance_count = 1, instance_type = 'ml.m4.xlarge')
Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
Using already existing model: xgboost-220212-2217-010-2fb6076e
In [81]:
xgb_transformer.transform(test_location_bogo, content_type='text/csv', split_type='Line')
In [82]:
xgb_transformer.wait()
................................Arguments: serve
[2022-02-14 11:17:57 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2022-02-14 11:17:57 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-02-14 11:17:57 +0000] [1] [INFO] Using worker: gevent
[2022-02-14 11:17:57 +0000] [21] [INFO] Booting worker with pid: 21
[2022-02-14 11:17:58 +0000] [22] [INFO] Booting worker with pid: 22
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)', 'urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:17:58:INFO] Model loaded successfully for worker : 21
[2022-02-14 11:17:58 +0000] [23] [INFO] Booting worker with pid: 23
[2022-02-14 11:17:58 +0000] [24] [INFO] Booting worker with pid: 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)', 'urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:17:58:INFO] Model loaded successfully for worker : 22
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)', 'urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:17:58:INFO] Model loaded successfully for worker : 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)', 'urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:17:58:INFO] Model loaded successfully for worker : 24
[2022-02-14:11:18:02:INFO] Sniff delimiter as ','
[2022-02-14:11:18:02:INFO] Determined delimiter of CSV input is ','

2022-02-14T11:18:02.248:[sagemaker logs]: MaxConcurrentTransforms=4, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD
In [83]:
!aws s3 cp --recursive $xgb_transformer.output_path $data_dir'/bogo'
download: s3://sagemaker-us-east-1-218287629635/xgboost-220212-2217-010-2fb6076e-2022-02-14-11-12-41-859/test_bogo.csv.out to data/bogo/test_bogo.csv.out
In [85]:
predictions = pd.read_csv(os.path.join(data_dir, 'bogo', 'test_bogo.csv.out'), header=None)
y_pred = [round(num) for num in predictions.squeeze().values]

accuracy_score(y_test_bogo, y_pred)
Out[85]:
0.6328524195972347
In [86]:
predictions.values
Out[86]:
array([[0.63769585],
       [0.65476155],
       [0.68196064],
       ...,
       [0.42178869],
       [0.62916368],
       [0.71828973]])

6.3.3 Training & Testing using discount offers only

Same steps as above

In [76]:
# as stated above, we use this utility method to construct the image name for the training container.
container = get_image_uri(session.boto_region_name, 'xgboost')

# now that we know which container to use, we can construct the estimator object.
xgb = sagemaker.estimator.Estimator(container, # The name of the training container
                                    role,      # The IAM role to use (our current role in this case)
                                    train_instance_count=1, # The number of instances to use for training
                                    train_instance_type='ml.m4.xlarge', # The type of instance ot use for training
                                    output_path='s3://{}/{}/output'.format(session.default_bucket(), prefix),
                                                                        # Where to save the output (the model artifacts)
                                    sagemaker_session=session) # The current SageMaker session
'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
There is a more up to date SageMaker XGBoost image. To use the newer image, please set 'repo_version'='1.0-1'. For example:
	get_image_uri(region, 'xgboost', '1.0-1').
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
In [77]:
xgb.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        objective='binary:logistic',
                        early_stopping_rounds=10,
                        num_round=200)
In [78]:
from sagemaker.tuner import IntegerParameter, ContinuousParameter, HyperparameterTuner

xgb_hyperparameter_tuner = HyperparameterTuner(estimator = xgb, # The estimator object to use as the basis for the training jobs.
                                               objective_metric_name = 'validation:rmse', # The metric used to compare trained models.
                                               objective_type = 'Minimize', # Whether we wish to minimize or maximize the metric.
                                               max_jobs = 20, # The total number of models to train
                                               max_parallel_jobs = 3, # The number of models to train in parallel
                                               hyperparameter_ranges = {
                                                    'max_depth': IntegerParameter(3, 12),
                                                    'eta'      : ContinuousParameter(0.05, 0.5),
                                                    'min_child_weight': IntegerParameter(2, 8),
                                                    'subsample': ContinuousParameter(0.5, 0.9),
                                                    'gamma': ContinuousParameter(0, 10),
                                               })
In [79]:
# This is a wrapper around the location of our train and validation data, to make sure that SageMaker
# knows our data is in csv format.
s3_input_train = sagemaker.s3_input(s3_data=train_location_discount, content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data=val_location_discount, content_type='csv')

xgb_hyperparameter_tuner.fit({'train': s3_input_train, 'validation': s3_input_validation})
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
In [76]:
xgb_hyperparameter_tuner.wait()
..............................................................................................................................................................................................................................................................................................................................................................................................!

Once the hyperamater tuner has finished, we can retrieve information about the best performing model.

In [77]:
xgb_hyperparameter_tuner.best_training_job()
Out[77]:
'xgboost-220212-2350-004-7621399f'
In [87]:
xgb_attached = sagemaker.estimator.Estimator.attach(xgb_hyperparameter_tuner.best_training_job())
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
2022-02-12 23:59:33 Starting - Preparing the instances for training
2022-02-12 23:59:33 Downloading - Downloading input data
2022-02-12 23:59:33 Training - Training image download completed. Training in progress.
2022-02-12 23:59:33 Uploading - Uploading generated training model
2022-02-12 23:59:33 Completed - Training job completedArguments: train
[2022-02-12:23:59:21:INFO] Running standalone xgboost training.
[2022-02-12:23:59:21:INFO] Setting up HPO optimized metric to be : rmse
[2022-02-12:23:59:21:INFO] File size need to be processed in the node: 1.16mb. Available memory size in the node: 8378.34mb
[2022-02-12:23:59:21:INFO] Determined delimiter of CSV input is ','
[23:59:21] S3DistributionType set as FullyReplicated
[23:59:21] 8938x15 matrix with 134070 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,
[2022-02-12:23:59:21:INFO] Determined delimiter of CSV input is ','
[23:59:21] S3DistributionType set as FullyReplicated
[23:59:21] 3926x15 matrix with 58890 entries loaded from /opt/ml/input/data/validation?format=csv&label_column=0&delimiter=,
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 32 pruned nodes, max_depth=5
[0]#011train-rmse:0.465408#011validation-rmse:0.463833
Multiple eval metrics have been passed: 'validation-rmse' will be used for early stopping.
Will train until validation-rmse hasn't improved in 10 rounds.
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 20 extra nodes, 36 pruned nodes, max_depth=5
[1]#011train-rmse:0.449059#011validation-rmse:0.446383
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 38 pruned nodes, max_depth=5
[2]#011train-rmse:0.441086#011validation-rmse:0.437332
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 42 pruned nodes, max_depth=5
[3]#011train-rmse:0.437256#011validation-rmse:0.433075
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 20 extra nodes, 36 pruned nodes, max_depth=5
[4]#011train-rmse:0.43452#011validation-rmse:0.430274
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 52 pruned nodes, max_depth=3
[5]#011train-rmse:0.433491#011validation-rmse:0.428857
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 42 pruned nodes, max_depth=4
[6]#011train-rmse:0.432598#011validation-rmse:0.428239
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 52 pruned nodes, max_depth=0
[7]#011train-rmse:0.432595#011validation-rmse:0.428248
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 38 pruned nodes, max_depth=5
[8]#011train-rmse:0.432083#011validation-rmse:0.427454
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 50 pruned nodes, max_depth=2
[9]#011train-rmse:0.43179#011validation-rmse:0.427258
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 40 pruned nodes, max_depth=3
[10]#011train-rmse:0.431386#011validation-rmse:0.426834
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 44 pruned nodes, max_depth=0
[11]#011train-rmse:0.431385#011validation-rmse:0.426856
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 50 pruned nodes, max_depth=1
[12]#011train-rmse:0.431218#011validation-rmse:0.426649
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 38 pruned nodes, max_depth=4
[13]#011train-rmse:0.430927#011validation-rmse:0.426312
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 52 pruned nodes, max_depth=0
[14]#011train-rmse:0.430927#011validation-rmse:0.42629
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 56 pruned nodes, max_depth=0
[15]#011train-rmse:0.43093#011validation-rmse:0.426271
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 34 pruned nodes, max_depth=0
[16]#011train-rmse:0.43093#011validation-rmse:0.426271
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[17]#011train-rmse:0.430931#011validation-rmse:0.426265
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[18]#011train-rmse:0.430929#011validation-rmse:0.426273
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 58 pruned nodes, max_depth=1
[19]#011train-rmse:0.430839#011validation-rmse:0.426269
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 44 pruned nodes, max_depth=0
[20]#011train-rmse:0.43084#011validation-rmse:0.426263
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 46 pruned nodes, max_depth=0
[21]#011train-rmse:0.430842#011validation-rmse:0.426252
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 46 pruned nodes, max_depth=0
[22]#011train-rmse:0.43084#011validation-rmse:0.426262
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 40 pruned nodes, max_depth=2
[23]#011train-rmse:0.430617#011validation-rmse:0.426239
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 46 pruned nodes, max_depth=0
[24]#011train-rmse:0.430619#011validation-rmse:0.426218
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[25]#011train-rmse:0.430617#011validation-rmse:0.426229
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 50 pruned nodes, max_depth=4
[26]#011train-rmse:0.430288#011validation-rmse:0.426333
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 54 pruned nodes, max_depth=0
[27]#011train-rmse:0.430287#011validation-rmse:0.426345
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 40 pruned nodes, max_depth=0
[28]#011train-rmse:0.430288#011validation-rmse:0.426366
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 40 pruned nodes, max_depth=0
[29]#011train-rmse:0.43029#011validation-rmse:0.426381
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 42 pruned nodes, max_depth=0
[30]#011train-rmse:0.430287#011validation-rmse:0.42634
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 36 pruned nodes, max_depth=0
[31]#011train-rmse:0.430287#011validation-rmse:0.426344
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 34 pruned nodes, max_depth=5
[32]#011train-rmse:0.429923#011validation-rmse:0.426356
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 32 pruned nodes, max_depth=0
[33]#011train-rmse:0.429918#011validation-rmse:0.426374
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 42 pruned nodes, max_depth=0
[34]#011train-rmse:0.429916#011validation-rmse:0.4264
Stopping. Best iteration:
[24]#011train-rmse:0.430619#011validation-rmse:0.426218
Training seconds: 62
Billable seconds: 62

Testing the model

In [88]:
xgb_transformer = xgb_attached.transformer(instance_count = 1, instance_type = 'ml.m4.xlarge')
Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
Using already existing model: xgboost-220212-2350-004-7621399f
In [89]:
xgb_transformer.transform(test_location_discount, content_type='text/csv', split_type='Line')
In [90]:
xgb_transformer.wait()
...................................Arguments: serve
[2022-02-14 11:25:04 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2022-02-14 11:25:04 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-02-14 11:25:04 +0000] [1] [INFO] Using worker: gevent
[2022-02-14 11:25:04 +0000] [21] [INFO] Booting worker with pid: 21
[2022-02-14 11:25:05 +0000] [22] [INFO] Booting worker with pid: 22
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:25:05:INFO] Model loaded successfully for worker : 21
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:25:05:INFO] Model loaded successfully for worker : 22
Arguments: serve
[2022-02-14 11:25:04 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2022-02-14 11:25:04 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-02-14 11:25:04 +0000] [1] [INFO] Using worker: gevent
[2022-02-14 11:25:04 +0000] [21] [INFO] Booting worker with pid: 21
[2022-02-14 11:25:05 +0000] [22] [INFO] Booting worker with pid: 22
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:25:05:INFO] Model loaded successfully for worker : 21
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:25:05:INFO] Model loaded successfully for worker : 22
[2022-02-14 11:25:05 +0000] [23] [INFO] Booting worker with pid: 23
[2022-02-14 11:25:05 +0000] [24] [INFO] Booting worker with pid: 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:25:05:INFO] Model loaded successfully for worker : 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:25:05:INFO] Model loaded successfully for worker : 24
[2022-02-14 11:25:05 +0000] [23] [INFO] Booting worker with pid: 23
[2022-02-14 11:25:05 +0000] [24] [INFO] Booting worker with pid: 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:25:05:INFO] Model loaded successfully for worker : 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:25:05:INFO] Model loaded successfully for worker : 24
[2022-02-14:11:25:09:INFO] Sniff delimiter as ','
[2022-02-14:11:25:09:INFO] Determined delimiter of CSV input is ','
[2022-02-14:11:25:09:INFO] Sniff delimiter as ','
[2022-02-14:11:25:09:INFO] Determined delimiter of CSV input is ','
2022-02-14T11:25:09.118:[sagemaker logs]: MaxConcurrentTransforms=4, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD

In [91]:
!aws s3 cp --recursive $xgb_transformer.output_path $data_dir'/discount'
download: s3://sagemaker-us-east-1-218287629635/xgboost-220212-2350-004-7621399f-2022-02-14-11-19-25-387/test_discount.csv.out to data/discount/test_discount.csv.out
In [93]:
predictions = pd.read_csv(os.path.join(data_dir, 'discount', 'test_discount.csv.out'), header=None)
y_pred = [round(num) for num in predictions.squeeze().values]

accuracy_score(y_test_discount, y_pred)
Out[93]:
0.7248781973203411
In [94]:
predictions.values
Out[94]:
array([[0.67545527],
       [0.75578934],
       [0.49818239],
       ...,
       [0.39152786],
       [0.83544475],
       [0.67408472]])

6.3.4 Training & Testing using informational offers only

Same steps as above

In [80]:
# as stated above, we use this utility method to construct the image name for the training container.
container = get_image_uri(session.boto_region_name, 'xgboost')

# now that we know which container to use, we can construct the estimator object.
xgb = sagemaker.estimator.Estimator(container, # The name of the training container
                                    role,      # The IAM role to use (our current role in this case)
                                    train_instance_count=1, # The number of instances to use for training
                                    train_instance_type='ml.m4.xlarge', # The type of instance ot use for training
                                    output_path='s3://{}/{}/output'.format(session.default_bucket(), prefix),
                                                                        # Where to save the output (the model artifacts)
                                    sagemaker_session=session) # The current SageMaker session
'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
There is a more up to date SageMaker XGBoost image. To use the newer image, please set 'repo_version'='1.0-1'. For example:
	get_image_uri(region, 'xgboost', '1.0-1').
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
In [81]:
xgb.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        objective='binary:logistic',
                        early_stopping_rounds=10,
                        num_round=200)
In [82]:
from sagemaker.tuner import IntegerParameter, ContinuousParameter, HyperparameterTuner

xgb_hyperparameter_tuner = HyperparameterTuner(estimator = xgb, # The estimator object to use as the basis for the training jobs.
                                               objective_metric_name = 'validation:rmse', # The metric used to compare trained models.
                                               objective_type = 'Minimize', # Whether we wish to minimize or maximize the metric.
                                               max_jobs = 20, # The total number of models to train
                                               max_parallel_jobs = 3, # The number of models to train in parallel
                                               hyperparameter_ranges = {
                                                    'max_depth': IntegerParameter(3, 12),
                                                    'eta'      : ContinuousParameter(0.05, 0.5),
                                                    'min_child_weight': IntegerParameter(2, 8),
                                                    'subsample': ContinuousParameter(0.5, 0.9),
                                                    'gamma': ContinuousParameter(0, 10),
                                               })
In [94]:
# This is a wrapper around the location of our train and validation data, to make sure that SageMaker
# knows our data is in csv format.
s3_input_train = sagemaker.s3_input(s3_data=train_location_informational, content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data=val_location_informational, content_type='csv')

xgb_hyperparameter_tuner.fit({'train': s3_input_train, 'validation': s3_input_validation})
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
In [89]:
xgb_hyperparameter_tuner.wait()
......................................................................................................................................................................................................................................................................................................................................................................................................!
In [90]:
xgb_hyperparameter_tuner.best_training_job()
Out[90]:
'xgboost-220213-0029-020-84539665'
In [95]:
xgb_attached = sagemaker.estimator.Estimator.attach(xgb_hyperparameter_tuner.best_training_job())
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
2022-02-13 01:01:11 Starting - Preparing the instances for training
2022-02-13 01:01:11 Downloading - Downloading input data
2022-02-13 01:01:11 Training - Training image download completed. Training in progress.
2022-02-13 01:01:11 Uploading - Uploading generated training model
2022-02-13 01:01:11 Completed - Training job completedArguments: train
[2022-02-13:01:01:01:INFO] Running standalone xgboost training.
[2022-02-13:01:01:01:INFO] Setting up HPO optimized metric to be : rmse
[2022-02-13:01:01:01:INFO] File size need to be processed in the node: 0.46mb. Available memory size in the node: 8359.67mb
[2022-02-13:01:01:01:INFO] Determined delimiter of CSV input is ','
[01:01:01] S3DistributionType set as FullyReplicated
[01:01:01] 4298x15 matrix with 64470 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,
[2022-02-13:01:01:01:INFO] Determined delimiter of CSV input is ','
[01:01:01] S3DistributionType set as FullyReplicated
[01:01:01] 1726x15 matrix with 25890 entries loaded from /opt/ml/input/data/validation?format=csv&label_column=0&delimiter=,
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 34 pruned nodes, max_depth=5
[0]#011train-rmse:0.496345#011validation-rmse:0.496172
Multiple eval metrics have been passed: 'validation-rmse' will be used for early stopping.
Will train until validation-rmse hasn't improved in 10 rounds.
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 38 pruned nodes, max_depth=5
[1]#011train-rmse:0.493471#011validation-rmse:0.493349
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 42 pruned nodes, max_depth=4
[2]#011train-rmse:0.491422#011validation-rmse:0.491249
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 38 pruned nodes, max_depth=5
[3]#011train-rmse:0.489614#011validation-rmse:0.489519
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 44 pruned nodes, max_depth=2
[4]#011train-rmse:0.488548#011validation-rmse:0.488475
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 44 pruned nodes, max_depth=4
[5]#011train-rmse:0.487491#011validation-rmse:0.487387
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 40 pruned nodes, max_depth=3
[6]#011train-rmse:0.486753#011validation-rmse:0.486729
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 32 pruned nodes, max_depth=5
[7]#011train-rmse:0.485859#011validation-rmse:0.485986
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 48 pruned nodes, max_depth=1
[8]#011train-rmse:0.485608#011validation-rmse:0.485639
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 46 pruned nodes, max_depth=1
[9]#011train-rmse:0.485401#011validation-rmse:0.485344
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 50 pruned nodes, max_depth=5
[10]#011train-rmse:0.484904#011validation-rmse:0.484827
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 32 pruned nodes, max_depth=3
[11]#011train-rmse:0.484575#011validation-rmse:0.484439
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 50 pruned nodes, max_depth=0
[12]#011train-rmse:0.484575#011validation-rmse:0.484439
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 52 pruned nodes, max_depth=1
[13]#011train-rmse:0.484457#011validation-rmse:0.484248
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[14]#011train-rmse:0.484458#011validation-rmse:0.484249
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 20 pruned nodes, max_depth=5
[15]#011train-rmse:0.484022#011validation-rmse:0.48381
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 48 pruned nodes, max_depth=2
[16]#011train-rmse:0.483844#011validation-rmse:0.483719
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 44 pruned nodes, max_depth=0
[17]#011train-rmse:0.483843#011validation-rmse:0.483718
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 34 pruned nodes, max_depth=0
[18]#011train-rmse:0.483844#011validation-rmse:0.48372
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[19]#011train-rmse:0.483844#011validation-rmse:0.48372
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 46 pruned nodes, max_depth=4
[20]#011train-rmse:0.483526#011validation-rmse:0.483405
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[21]#011train-rmse:0.483526#011validation-rmse:0.483405
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[22]#011train-rmse:0.483529#011validation-rmse:0.48341
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[23]#011train-rmse:0.483531#011validation-rmse:0.483414
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[24]#011train-rmse:0.48353#011validation-rmse:0.483412
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[25]#011train-rmse:0.483529#011validation-rmse:0.48341
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 44 pruned nodes, max_depth=0
[26]#011train-rmse:0.483528#011validation-rmse:0.483409
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 38 pruned nodes, max_depth=4
[27]#011train-rmse:0.483265#011validation-rmse:0.483341
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 26 pruned nodes, max_depth=0
[28]#011train-rmse:0.483266#011validation-rmse:0.483342
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 28 pruned nodes, max_depth=0
[29]#011train-rmse:0.483263#011validation-rmse:0.483337
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 24 pruned nodes, max_depth=4
[30]#011train-rmse:0.482989#011validation-rmse:0.483138
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[31]#011train-rmse:0.482989#011validation-rmse:0.483138
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 32 pruned nodes, max_depth=0
[32]#011train-rmse:0.482989#011validation-rmse:0.483139
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 34 pruned nodes, max_depth=0
[33]#011train-rmse:0.48299#011validation-rmse:0.483141
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 40 pruned nodes, max_depth=0
[34]#011train-rmse:0.482989#011validation-rmse:0.483139
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 40 pruned nodes, max_depth=0
[35]#011train-rmse:0.48299#011validation-rmse:0.48314
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 30 pruned nodes, max_depth=0
[36]#011train-rmse:0.48299#011validation-rmse:0.483142
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 32 pruned nodes, max_depth=0
[37]#011train-rmse:0.482991#011validation-rmse:0.483143
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 50 pruned nodes, max_depth=0
[38]#011train-rmse:0.48299#011validation-rmse:0.483141
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 40 pruned nodes, max_depth=4
[39]#011train-rmse:0.482756#011validation-rmse:0.483091
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 36 pruned nodes, max_depth=0
[40]#011train-rmse:0.482756#011validation-rmse:0.483092
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[41]#011train-rmse:0.482755#011validation-rmse:0.483091
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[42]#011train-rmse:0.482756#011validation-rmse:0.483092
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 24 pruned nodes, max_depth=4
[43]#011train-rmse:0.482469#011validation-rmse:0.483037
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 36 pruned nodes, max_depth=0
[44]#011train-rmse:0.482469#011validation-rmse:0.483036
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 34 pruned nodes, max_depth=0
[45]#011train-rmse:0.482469#011validation-rmse:0.483036
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 42 pruned nodes, max_depth=5
[46]#011train-rmse:0.481998#011validation-rmse:0.482971
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 36 pruned nodes, max_depth=0
[47]#011train-rmse:0.481997#011validation-rmse:0.482969
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 42 pruned nodes, max_depth=0
[48]#011train-rmse:0.481998#011validation-rmse:0.48297
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 26 pruned nodes, max_depth=0
[49]#011train-rmse:0.481998#011validation-rmse:0.482971
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 26 pruned nodes, max_depth=0
[50]#011train-rmse:0.481998#011validation-rmse:0.482971
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 46 pruned nodes, max_depth=4
[51]#011train-rmse:0.481743#011validation-rmse:0.482996
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 28 pruned nodes, max_depth=0
[52]#011train-rmse:0.481744#011validation-rmse:0.482997
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 46 pruned nodes, max_depth=0
[53]#011train-rmse:0.481744#011validation-rmse:0.482998
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 32 pruned nodes, max_depth=0
[54]#011train-rmse:0.481743#011validation-rmse:0.482996
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 34 pruned nodes, max_depth=0
[55]#011train-rmse:0.481743#011validation-rmse:0.482996
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 28 pruned nodes, max_depth=0
[56]#011train-rmse:0.481743#011validation-rmse:0.482994
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 30 pruned nodes, max_depth=0
[57]#011train-rmse:0.481744#011validation-rmse:0.482997
Stopping. Best iteration:
[47]#011train-rmse:0.481997#011validation-rmse:0.482969
Training seconds: 72
Billable seconds: 72

Testing the model

In [96]:
xgb_transformer = xgb_attached.transformer(instance_count = 1, instance_type = 'ml.m4.xlarge')
Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
Using already existing model: xgboost-220213-0029-020-84539665
In [97]:
xgb_transformer.transform(test_location_informational, content_type='text/csv', split_type='Line')
In [98]:
xgb_transformer.wait()
.....................................Arguments: serve
[2022-02-14 11:33:44 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2022-02-14 11:33:44 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-02-14 11:33:44 +0000] [1] [INFO] Using worker: gevent
[2022-02-14 11:33:44 +0000] [21] [INFO] Booting worker with pid: 21
[2022-02-14 11:33:44 +0000] [22] [INFO] Booting worker with pid: 22
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:33:44:INFO] Model loaded successfully for worker : 21
[2022-02-14 11:33:44 +0000] [23] [INFO] Booting worker with pid: 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:33:44:INFO] Model loaded successfully for worker : 22
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:33:44:INFO] Model loaded successfully for worker : 23
[2022-02-14 11:33:44 +0000] [24] [INFO] Booting worker with pid: 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:33:44:INFO] Model loaded successfully for worker : 24
[2022-02-14:11:33:48:INFO] Sniff delimiter as ','
[2022-02-14:11:33:48:INFO] Determined delimiter of CSV input is ','
[2022-02-14:11:33:48:INFO] Sniff delimiter as ','
[2022-02-14:11:33:48:INFO] Determined delimiter of CSV input is ','
2022-02-14T11:33:48.263:[sagemaker logs]: MaxConcurrentTransforms=4, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD

In [99]:
!aws s3 cp --recursive $xgb_transformer.output_path $data_dir'/informational'
download: s3://sagemaker-us-east-1-218287629635/xgboost-220213-0029-020-84539665-2022-02-14-11-27-35-975/test_informational.csv.out to data/informational/test_informational.csv.out
In [100]:
predictions = pd.read_csv(os.path.join(data_dir, 'informational', 'test_informational.csv.out'), header=None)
y_pred = [round(num) for num in predictions.squeeze().values]

accuracy_score(y_test_informational, y_pred)
Out[100]:
0.5746924428822495
In [101]:
predictions.values
Out[101]:
array([[0.56090522],
       [0.48902473],
       [0.60453463],
       ...,
       [0.48171777],
       [0.36625379],
       [0.47080207]])

7. Conclusion - Use Case

7.1 Select a random testing sample for the use case

In [84]:
X_test_use_case = X_test.sample(n=100, random_state=9)
X_test_use_case_no_offer_info = X_test_use_case.drop(columns=['bogo', 'discount', 'informational'])
y_test_use_case = y_test.loc[X_test_use_case.index]

X_test_use_case.shape, X_test_use_case_no_offer_info.shape,  y_test_use_case.shape
Out[84]:
((100, 18), (100, 15), (100,))

7.2: Uploading the data files to S3

When a training job is constructed using SageMaker, a container is executed which performs the training operation. This container is given access to data that is stored in S3. This means that we need to upload the data we want to use for training to S3. In addition, when we perform a batch transform job, SageMaker expects the input data to be stored on S3. We can use the SageMaker API to do this and hide some of the details.

Save the data locally

First we need to create the test, train and validation csv files which we will then upload to S3.

In [85]:
X_test_use_case.to_csv(os.path.join(data_dir, 'use_case', 'test_use_case.csv'), header=False, index=False)
X_test_use_case_no_offer_info.to_csv(os.path.join(data_dir, 'use_case',  'test_use_case_no_offer_info.csv'), header=False, index=False)

Upload to S3

Since we are currently running inside of a SageMaker session, we can use the object which represents this session to upload our data to the 'default' S3 bucket. Note that it is good practice to provide a custom prefix (essentially an S3 folder) to make sure that you don't accidentally interfere with data uploaded from some other notebook or project.

In [86]:
prefix = 'starbucks-xgboost'

test_location_use_case = session.upload_data(os.path.join(data_dir, 'use_case', 'test_use_case.csv'), key_prefix=prefix)

test_location_use_case_no_offer_info = session.upload_data(os.path.join(data_dir, 'use_case', 'test_use_case_no_offer_info.csv'), key_prefix=prefix)

7.3 Get predictions on the sample from all XGBoost models

7.3.1 All offers model's predictions (we are able to test performance on this one too)

In [105]:
xgb_attached = sagemaker.estimator.Estimator.attach('xgboost-220212-2140-009-fbc78020') # best model's details have been taken from 6.3.1 section
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
2022-02-12 21:52:27 Starting - Preparing the instances for training
2022-02-12 21:52:27 Downloading - Downloading input data
2022-02-12 21:52:27 Training - Training image download completed. Training in progress.
2022-02-12 21:52:27 Uploading - Uploading generated training model
2022-02-12 21:52:27 Completed - Training job completedArguments: train
[2022-02-12:21:52:13:INFO] Running standalone xgboost training.
[2022-02-12:21:52:13:INFO] Setting up HPO optimized metric to be : rmse
[2022-02-12:21:52:13:INFO] File size need to be processed in the node: 2.74mb. Available memory size in the node: 8527.95mb
[2022-02-12:21:52:13:INFO] Determined delimiter of CSV input is ','
[21:52:13] S3DistributionType set as FullyReplicated
[21:52:13] 21178x18 matrix with 381204 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,
[2022-02-12:21:52:13:INFO] Determined delimiter of CSV input is ','
[21:52:13] S3DistributionType set as FullyReplicated
[21:52:13] 9116x18 matrix with 164088 entries loaded from /opt/ml/input/data/validation?format=csv&label_column=0&delimiter=,
[21:52:13] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 26 extra nodes, 86 pruned nodes, max_depth=6
[0]#011train-rmse:0.48829#011validation-rmse:0.487431
Multiple eval metrics have been passed: 'validation-rmse' will be used for early stopping.
Will train until validation-rmse hasn't improved in 10 rounds.
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 28 extra nodes, 76 pruned nodes, max_depth=6
[1]#011train-rmse:0.480162#011validation-rmse:0.478911
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 26 extra nodes, 84 pruned nodes, max_depth=6
[2]#011train-rmse:0.474606#011validation-rmse:0.472981
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 32 extra nodes, 72 pruned nodes, max_depth=6
[3]#011train-rmse:0.470662#011validation-rmse:0.468871
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 80 pruned nodes, max_depth=6
[4]#011train-rmse:0.467823#011validation-rmse:0.465742
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 30 extra nodes, 84 pruned nodes, max_depth=6
[5]#011train-rmse:0.465834#011validation-rmse:0.463617
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 78 pruned nodes, max_depth=6
[6]#011train-rmse:0.464402#011validation-rmse:0.462107
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 22 extra nodes, 86 pruned nodes, max_depth=6
[7]#011train-rmse:0.463363#011validation-rmse:0.46097
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 106 pruned nodes, max_depth=5
[8]#011train-rmse:0.462522#011validation-rmse:0.460013
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 16 extra nodes, 84 pruned nodes, max_depth=6
[9]#011train-rmse:0.461991#011validation-rmse:0.459447
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 100 pruned nodes, max_depth=3
[10]#011train-rmse:0.46173#011validation-rmse:0.459133
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 94 pruned nodes, max_depth=5
[11]#011train-rmse:0.461124#011validation-rmse:0.458789
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 84 pruned nodes, max_depth=6
[12]#011train-rmse:0.460818#011validation-rmse:0.458498
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 80 pruned nodes, max_depth=2
[13]#011train-rmse:0.4607#011validation-rmse:0.458267
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 90 pruned nodes, max_depth=3
[14]#011train-rmse:0.460544#011validation-rmse:0.458086
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 82 pruned nodes, max_depth=1
[15]#011train-rmse:0.460498#011validation-rmse:0.458039
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 84 pruned nodes, max_depth=2
[16]#011train-rmse:0.460408#011validation-rmse:0.457968
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 104 pruned nodes, max_depth=5
[17]#011train-rmse:0.460172#011validation-rmse:0.457709
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 112 pruned nodes, max_depth=0
[18]#011train-rmse:0.460172#011validation-rmse:0.45771
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 98 pruned nodes, max_depth=0
[19]#011train-rmse:0.460172#011validation-rmse:0.457718
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 98 pruned nodes, max_depth=6
[20]#011train-rmse:0.459961#011validation-rmse:0.457584
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 64 pruned nodes, max_depth=5
[21]#011train-rmse:0.459768#011validation-rmse:0.45744
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 58 pruned nodes, max_depth=6
[22]#011train-rmse:0.459624#011validation-rmse:0.457336
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 90 pruned nodes, max_depth=4
[23]#011train-rmse:0.459511#011validation-rmse:0.457329
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 92 pruned nodes, max_depth=0
[24]#011train-rmse:0.45951#011validation-rmse:0.457322
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 82 pruned nodes, max_depth=0
[25]#011train-rmse:0.459511#011validation-rmse:0.457331
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 64 pruned nodes, max_depth=0
[26]#011train-rmse:0.45951#011validation-rmse:0.457326
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 84 pruned nodes, max_depth=3
[27]#011train-rmse:0.459441#011validation-rmse:0.457193
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 98 pruned nodes, max_depth=0
[28]#011train-rmse:0.459443#011validation-rmse:0.457199
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 66 pruned nodes, max_depth=3
[29]#011train-rmse:0.459305#011validation-rmse:0.457167
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 94 pruned nodes, max_depth=5
[30]#011train-rmse:0.459201#011validation-rmse:0.457008
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 98 pruned nodes, max_depth=5
[31]#011train-rmse:0.459097#011validation-rmse:0.456994
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 68 pruned nodes, max_depth=0
[32]#011train-rmse:0.459097#011validation-rmse:0.456988
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 62 pruned nodes, max_depth=6
[33]#011train-rmse:0.458913#011validation-rmse:0.456973
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 92 pruned nodes, max_depth=0
[34]#011train-rmse:0.458913#011validation-rmse:0.456972
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 76 pruned nodes, max_depth=6
[35]#011train-rmse:0.458831#011validation-rmse:0.456969
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 114 pruned nodes, max_depth=0
[36]#011train-rmse:0.458831#011validation-rmse:0.456966
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 68 pruned nodes, max_depth=0
[37]#011train-rmse:0.458832#011validation-rmse:0.456961
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 106 pruned nodes, max_depth=0
[38]#011train-rmse:0.458834#011validation-rmse:0.456958
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 112 pruned nodes, max_depth=0
[39]#011train-rmse:0.458831#011validation-rmse:0.456963
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 44 pruned nodes, max_depth=0
[40]#011train-rmse:0.458831#011validation-rmse:0.456966
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 72 pruned nodes, max_depth=5
[41]#011train-rmse:0.458723#011validation-rmse:0.456765
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 88 pruned nodes, max_depth=0
[42]#011train-rmse:0.458723#011validation-rmse:0.456765
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 60 pruned nodes, max_depth=6
[43]#011train-rmse:0.458545#011validation-rmse:0.456733
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 102 pruned nodes, max_depth=3
[44]#011train-rmse:0.458493#011validation-rmse:0.456742
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 110 pruned nodes, max_depth=0
[45]#011train-rmse:0.458494#011validation-rmse:0.456738
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 80 pruned nodes, max_depth=0
[46]#011train-rmse:0.458494#011validation-rmse:0.456737
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 60 pruned nodes, max_depth=0
[47]#011train-rmse:0.458495#011validation-rmse:0.456735
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 90 pruned nodes, max_depth=0
[48]#011train-rmse:0.458493#011validation-rmse:0.456738
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 54 pruned nodes, max_depth=0
[49]#011train-rmse:0.458493#011validation-rmse:0.456738
[21:52:14] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 110 pruned nodes, max_depth=0
[50]#011train-rmse:0.458494#011validation-rmse:0.456737
[21:52:15] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 100 pruned nodes, max_depth=0
[51]#011train-rmse:0.458494#011validation-rmse:0.456736
[21:52:15] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 52 pruned nodes, max_depth=3
[52]#011train-rmse:0.458454#011validation-rmse:0.456738
[21:52:15] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 100 pruned nodes, max_depth=0
[53]#011train-rmse:0.458455#011validation-rmse:0.456736
Stopping. Best iteration:
[43]#011train-rmse:0.458545#011validation-rmse:0.456733
Training seconds: 78
Billable seconds: 78

Testing the model

Now that we have our best performing model, we can test it. To do this we will use the batch transform functionality. To start with, we need to build a transformer object from our fit model.

In [106]:
xgb_transformer = xgb_attached.transformer(instance_count = 1, instance_type = 'ml.m4.xlarge')
Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
Using already existing model: xgboost-220212-2140-009-fbc78020

Next we ask SageMaker to begin a batch transform job using our trained model and applying it to the test data we previous stored in S3. We need to make sure to provide SageMaker with the type of data that we are providing to our model, in our case text/csv, so that it knows how to serialize our data. In addition, we need to make sure to let SageMaker know how to split our data up into chunks if the entire data set happens to be too large to send to our model all at once.

Note that when we ask SageMaker to do this it will execute the batch transform job in the background. Since we need to wait for the results of this job before we can continue, we use the wait() method. An added benefit of this is that we get some output from our batch transform job which lets us know if anything went wrong.

In [107]:
xgb_transformer.transform(test_location_use_case, content_type='text/csv', split_type='Line')

Currently the transform job is running but it is doing so in the background. Since we wish to wait until the transform job is done and we would like a bit of feedback we can run the wait() method.

In [108]:
xgb_transformer.wait()
...................................Arguments: serve
[2022-02-14 11:44:15 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2022-02-14 11:44:15 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-02-14 11:44:15 +0000] [1] [INFO] Using worker: gevent
[2022-02-14 11:44:15 +0000] [21] [INFO] Booting worker with pid: 21
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:44:15:INFO] Model loaded successfully for worker : 21
[2022-02-14 11:44:15 +0000] [22] [INFO] Booting worker with pid: 22
[2022-02-14 11:44:15 +0000] [23] [INFO] Booting worker with pid: 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:44:15:INFO] Model loaded successfully for worker : 22
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:44:15:INFO] Model loaded successfully for worker : 23
[2022-02-14 11:44:15 +0000] [24] [INFO] Booting worker with pid: 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:44:15:INFO] Model loaded successfully for worker : 24

[2022-02-14:11:44:19:INFO] Sniff delimiter as ','
[2022-02-14:11:44:19:INFO] Determined delimiter of CSV input is ','
[2022-02-14:11:44:19:INFO] Sniff delimiter as ','
[2022-02-14:11:44:19:INFO] Determined delimiter of CSV input is ','
2022-02-14T11:44:19.217:[sagemaker logs]: MaxConcurrentTransforms=4, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD

Now the transform job has executed and the result, the estimated sentiment of each review, has been saved on S3. Since we would rather work on this file locally we can perform a bit of notebook magic to copy the file to the data_dir.

In [110]:
!aws s3 cp --recursive $xgb_transformer.output_path $data_dir'/all_offers'
download: s3://sagemaker-us-east-1-218287629635/xgboost-220212-2140-009-fbc78020-2022-02-14-11-38-20-492/test_use_case.csv.out to data/all_offers/test_use_case.csv.out

The last step is now to read in the output from our model, convert the output to something a little more usable, in this case we want the sentiment to be either 1 (positive) or 0 (negative), and then compare to the ground truth labels.

In [88]:
predictions_all_offers = pd.read_csv(os.path.join(data_dir, 'all_offers', 'test_use_case.csv.out'), header=None)
y_pred = [round(num) for num in predictions_all_offers.squeeze().values]

accuracy_score(y_test_use_case, y_pred)
Out[88]:
0.82
In [89]:
predictions_all_offers.values
Out[89]:
array([[0.41994193],
       [0.55339468],
       [0.55773371],
       [0.66432804],
       [0.59391397],
       [0.54830337],
       [0.56131923],
       [0.58975726],
       [0.80908704],
       [0.66973031],
       [0.73824084],
       [0.75442034],
       [0.71881962],
       [0.7219401 ],
       [0.59437901],
       [0.2799696 ],
       [0.66916209],
       [0.3456594 ],
       [0.74945581],
       [0.12214141],
       [0.67614251],
       [0.5949775 ],
       [0.29283041],
       [0.51029485],
       [0.73645264],
       [0.68387204],
       [0.33351904],
       [0.44922641],
       [0.81638557],
       [0.25534439],
       [0.13545899],
       [0.31260759],
       [0.76029193],
       [0.35004869],
       [0.49484706],
       [0.69712514],
       [0.2907908 ],
       [0.76972264],
       [0.58697569],
       [0.82632327],
       [0.38912141],
       [0.48527515],
       [0.62309486],
       [0.53452593],
       [0.64636493],
       [0.515347  ],
       [0.48377913],
       [0.32832977],
       [0.5070433 ],
       [0.55692643],
       [0.1801253 ],
       [0.75693512],
       [0.65362298],
       [0.25500777],
       [0.58091259],
       [0.48218405],
       [0.66250753],
       [0.41658869],
       [0.34965664],
       [0.20733479],
       [0.13961494],
       [0.52044004],
       [0.67614251],
       [0.15415572],
       [0.70775175],
       [0.74171627],
       [0.68471611],
       [0.17658095],
       [0.55089551],
       [0.64275455],
       [0.78099447],
       [0.30731112],
       [0.59070933],
       [0.47678658],
       [0.61642283],
       [0.55089551],
       [0.55426997],
       [0.71627045],
       [0.48181328],
       [0.2837339 ],
       [0.19539368],
       [0.39077306],
       [0.45213178],
       [0.52049512],
       [0.63594317],
       [0.52175993],
       [0.77763551],
       [0.22444874],
       [0.8456803 ],
       [0.44759926],
       [0.38561252],
       [0.36335063],
       [0.81419051],
       [0.61316669],
       [0.72371274],
       [0.70715946],
       [0.59659827],
       [0.81569469],
       [0.54321891],
       [0.55339468]])

7.3.2 Bogo model's predidictions

In [114]:
xgb_attached = sagemaker.estimator.Estimator.attach('xgboost-220212-2217-010-2fb6076e') # best model's details have been taken from 6.3.2 section
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
2022-02-12 22:32:33 Starting - Preparing the instances for training
2022-02-12 22:32:33 Downloading - Downloading input data
2022-02-12 22:32:33 Training - Training image download completed. Training in progress.
2022-02-12 22:32:33 Uploading - Uploading generated training model
2022-02-12 22:32:33 Completed - Training job completedArguments: train
[2022-02-12:22:32:20:INFO] Running standalone xgboost training.
[2022-02-12:22:32:20:INFO] Setting up HPO optimized metric to be : rmse
[2022-02-12:22:32:20:INFO] File size need to be processed in the node: 0.96mb. Available memory size in the node: 8375.31mb
[2022-02-12:22:32:20:INFO] Determined delimiter of CSV input is ','
[22:32:20] S3DistributionType set as FullyReplicated
[22:32:20] 7942x15 matrix with 119130 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,
[2022-02-12:22:32:20:INFO] Determined delimiter of CSV input is ','
[22:32:20] S3DistributionType set as FullyReplicated
[22:32:20] 3464x15 matrix with 51960 entries loaded from /opt/ml/input/data/validation?format=csv&label_column=0&delimiter=,
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 16 extra nodes, 72 pruned nodes, max_depth=4
[0]#011train-rmse:0.4935#011validation-rmse:0.493756
Multiple eval metrics have been passed: 'validation-rmse' will be used for early stopping.
Will train until validation-rmse hasn't improved in 10 rounds.
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 28 extra nodes, 60 pruned nodes, max_depth=6
[1]#011train-rmse:0.488017#011validation-rmse:0.489021
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 70 pruned nodes, max_depth=4
[2]#011train-rmse:0.484897#011validation-rmse:0.485948
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 66 pruned nodes, max_depth=6
[3]#011train-rmse:0.48252#011validation-rmse:0.483869
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 62 pruned nodes, max_depth=2
[4]#011train-rmse:0.481348#011validation-rmse:0.482688
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 84 pruned nodes, max_depth=5
[5]#011train-rmse:0.480112#011validation-rmse:0.481424
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 62 pruned nodes, max_depth=4
[6]#011train-rmse:0.479387#011validation-rmse:0.480599
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 66 pruned nodes, max_depth=2
[7]#011train-rmse:0.479029#011validation-rmse:0.48022
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 58 pruned nodes, max_depth=4
[8]#011train-rmse:0.478106#011validation-rmse:0.479349
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 76 pruned nodes, max_depth=2
[9]#011train-rmse:0.477848#011validation-rmse:0.479014
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 66 pruned nodes, max_depth=3
[10]#011train-rmse:0.477642#011validation-rmse:0.478897
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 58 pruned nodes, max_depth=2
[11]#011train-rmse:0.477382#011validation-rmse:0.478691
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 58 pruned nodes, max_depth=3
[12]#011train-rmse:0.477153#011validation-rmse:0.478852
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 60 pruned nodes, max_depth=5
[13]#011train-rmse:0.47662#011validation-rmse:0.47863
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 50 pruned nodes, max_depth=0
[14]#011train-rmse:0.47662#011validation-rmse:0.478632
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 50 pruned nodes, max_depth=4
[15]#011train-rmse:0.476378#011validation-rmse:0.478457
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 58 pruned nodes, max_depth=6
[16]#011train-rmse:0.475806#011validation-rmse:0.478195
[22:32:20] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 58 pruned nodes, max_depth=0
[17]#011train-rmse:0.475807#011validation-rmse:0.478194
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 80 pruned nodes, max_depth=0
[18]#011train-rmse:0.47581#011validation-rmse:0.478192
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 70 pruned nodes, max_depth=0
[19]#011train-rmse:0.475809#011validation-rmse:0.478192
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 62 pruned nodes, max_depth=0
[20]#011train-rmse:0.475809#011validation-rmse:0.478192
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 82 pruned nodes, max_depth=0
[21]#011train-rmse:0.475806#011validation-rmse:0.478195
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 82 pruned nodes, max_depth=0
[22]#011train-rmse:0.475806#011validation-rmse:0.478197
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 56 pruned nodes, max_depth=0
[23]#011train-rmse:0.475806#011validation-rmse:0.478194
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 68 pruned nodes, max_depth=0
[24]#011train-rmse:0.475809#011validation-rmse:0.478192
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 96 pruned nodes, max_depth=0
[25]#011train-rmse:0.475807#011validation-rmse:0.478193
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 16 pruned nodes, max_depth=5
[26]#011train-rmse:0.475598#011validation-rmse:0.478206
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 32 pruned nodes, max_depth=6
[27]#011train-rmse:0.475279#011validation-rmse:0.478149
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 86 pruned nodes, max_depth=3
[28]#011train-rmse:0.475122#011validation-rmse:0.477953
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 66 pruned nodes, max_depth=0
[29]#011train-rmse:0.475128#011validation-rmse:0.477951
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 44 pruned nodes, max_depth=6
[30]#011train-rmse:0.474833#011validation-rmse:0.478008
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 100 pruned nodes, max_depth=4
[31]#011train-rmse:0.474681#011validation-rmse:0.477983
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 74 pruned nodes, max_depth=4
[32]#011train-rmse:0.474248#011validation-rmse:0.477832
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 54 pruned nodes, max_depth=0
[33]#011train-rmse:0.474249#011validation-rmse:0.477832
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 52 pruned nodes, max_depth=0
[34]#011train-rmse:0.474249#011validation-rmse:0.477832
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 60 pruned nodes, max_depth=0
[35]#011train-rmse:0.474246#011validation-rmse:0.477833
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 66 pruned nodes, max_depth=0
[36]#011train-rmse:0.474247#011validation-rmse:0.477833
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 62 pruned nodes, max_depth=0
[37]#011train-rmse:0.474245#011validation-rmse:0.477835
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 92 pruned nodes, max_depth=0
[38]#011train-rmse:0.474246#011validation-rmse:0.477842
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 46 pruned nodes, max_depth=0
[39]#011train-rmse:0.474246#011validation-rmse:0.477842
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 50 pruned nodes, max_depth=6
[40]#011train-rmse:0.473962#011validation-rmse:0.477898
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 62 pruned nodes, max_depth=0
[41]#011train-rmse:0.473956#011validation-rmse:0.4779
[22:32:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 44 pruned nodes, max_depth=6
[42]#011train-rmse:0.473735#011validation-rmse:0.477887
Stopping. Best iteration:
[32]#011train-rmse:0.474248#011validation-rmse:0.477832
Training seconds: 62
Billable seconds: 62
In [115]:
xgb_transformer = xgb_attached.transformer(instance_count = 1, instance_type = 'ml.m4.xlarge')
Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
Using already existing model: xgboost-220212-2217-010-2fb6076e
In [116]:
xgb_transformer.transform(test_location_use_case_no_offer_info, content_type='text/csv', split_type='Line')
In [117]:
xgb_transformer.wait()
................................Arguments: serve
[2022-02-14 11:53:33 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2022-02-14 11:53:33 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-02-14 11:53:33 +0000] [1] [INFO] Using worker: gevent
[2022-02-14 11:53:33 +0000] [21] [INFO] Booting worker with pid: 21
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:53:33:INFO] Model loaded successfully for worker : 21
[2022-02-14 11:53:33 +0000] [22] [INFO] Booting worker with pid: 22
[2022-02-14 11:53:33 +0000] [23] [INFO] Booting worker with pid: 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
Arguments: serve
[2022-02-14 11:53:33 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2022-02-14 11:53:33 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-02-14 11:53:33 +0000] [1] [INFO] Using worker: gevent
[2022-02-14 11:53:33 +0000] [21] [INFO] Booting worker with pid: 21
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:53:33:INFO] Model loaded successfully for worker : 21
[2022-02-14 11:53:33 +0000] [22] [INFO] Booting worker with pid: 22
[2022-02-14 11:53:33 +0000] [23] [INFO] Booting worker with pid: 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:53:33:INFO] Model loaded successfully for worker : 22
[2022-02-14 11:53:33 +0000] [24] [INFO] Booting worker with pid: 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:53:33:INFO] Model loaded successfully for worker : 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:53:33:INFO] Model loaded successfully for worker : 24
[2022-02-14:11:53:33:INFO] Model loaded successfully for worker : 22
[2022-02-14 11:53:33 +0000] [24] [INFO] Booting worker with pid: 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:53:33:INFO] Model loaded successfully for worker : 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:11:53:33:INFO] Model loaded successfully for worker : 24
[2022-02-14:11:53:38:INFO] Sniff delimiter as ','
[2022-02-14:11:53:38:INFO] Determined delimiter of CSV input is ','
[2022-02-14:11:53:38:INFO] Sniff delimiter as ','
[2022-02-14:11:53:38:INFO] Determined delimiter of CSV input is ','
2022-02-14T11:53:38.087:[sagemaker logs]: MaxConcurrentTransforms=4, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD

In [118]:
!aws s3 cp --recursive $xgb_transformer.output_path $data_dir'/bogo'
download: s3://sagemaker-us-east-1-218287629635/xgboost-220212-2217-010-2fb6076e-2022-02-14-11-48-26-551/test_use_case_no_offer_info.csv.out to data/bogo/test_use_case_no_offer_info.csv.out
In [90]:
predictions_bogo = pd.read_csv(os.path.join(data_dir, 'bogo', 'test_use_case_no_offer_info.csv.out'), header=None)
predictions_bogo.values
Out[90]:
array([[0.34784952],
       [0.51648378],
       [0.66530418],
       [0.49361989],
       [0.6031279 ],
       [0.49369282],
       [0.54156268],
       [0.54697222],
       [0.65821224],
       [0.52763897],
       [0.56800151],
       [0.59128547],
       [0.46869329],
       [0.48510486],
       [0.5636034 ],
       [0.26497149],
       [0.59311551],
       [0.25384504],
       [0.53750831],
       [0.23528804],
       [0.51709688],
       [0.59516478],
       [0.27101216],
       [0.46993655],
       [0.45411482],
       [0.66290921],
       [0.37866411],
       [0.36555964],
       [0.53865707],
       [0.31015098],
       [0.30175054],
       [0.29317957],
       [0.49218771],
       [0.43372369],
       [0.49455509],
       [0.66611379],
       [0.36165205],
       [0.53865707],
       [0.53528559],
       [0.54152346],
       [0.3809557 ],
       [0.41275504],
       [0.54571813],
       [0.55569404],
       [0.62506521],
       [0.50914192],
       [0.4253985 ],
       [0.3876242 ],
       [0.48684868],
       [0.5715512 ],
       [0.25132778],
       [0.4937776 ],
       [0.48349661],
       [0.30358431],
       [0.55050707],
       [0.49905241],
       [0.61691046],
       [0.38786653],
       [0.4615128 ],
       [0.26006413],
       [0.20774812],
       [0.54368448],
       [0.53300393],
       [0.21139693],
       [0.46696907],
       [0.68939269],
       [0.6938203 ],
       [0.21425268],
       [0.53674078],
       [0.45911744],
       [0.4753888 ],
       [0.24990395],
       [0.63115489],
       [0.54262674],
       [0.44310927],
       [0.53674078],
       [0.56666929],
       [0.4245666 ],
       [0.39888191],
       [0.26176831],
       [0.316075  ],
       [0.37308034],
       [0.43461886],
       [0.55569404],
       [0.60897243],
       [0.53484547],
       [0.48617566],
       [0.2551159 ],
       [0.54281855],
       [0.48037136],
       [0.35620701],
       [0.30358431],
       [0.5797891 ],
       [0.57948953],
       [0.52410674],
       [0.69172549],
       [0.67301637],
       [0.6265102 ],
       [0.53300393],
       [0.53894365]])

7.3.3 Testing the discount model

In [121]:
xgb_attached = sagemaker.estimator.Estimator.attach('xgboost-220212-2350-004-7621399f') # best model's details have been taken from 6.3.3 section
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
2022-02-12 23:59:33 Starting - Preparing the instances for training
2022-02-12 23:59:33 Downloading - Downloading input data
2022-02-12 23:59:33 Training - Training image download completed. Training in progress.
2022-02-12 23:59:33 Uploading - Uploading generated training model
2022-02-12 23:59:33 Completed - Training job completedArguments: train
[2022-02-12:23:59:21:INFO] Running standalone xgboost training.
[2022-02-12:23:59:21:INFO] Setting up HPO optimized metric to be : rmse
[2022-02-12:23:59:21:INFO] File size need to be processed in the node: 1.16mb. Available memory size in the node: 8378.34mb
[2022-02-12:23:59:21:INFO] Determined delimiter of CSV input is ','
[23:59:21] S3DistributionType set as FullyReplicated
[23:59:21] 8938x15 matrix with 134070 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,
[2022-02-12:23:59:21:INFO] Determined delimiter of CSV input is ','
[23:59:21] S3DistributionType set as FullyReplicated
[23:59:21] 3926x15 matrix with 58890 entries loaded from /opt/ml/input/data/validation?format=csv&label_column=0&delimiter=,
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 18 extra nodes, 32 pruned nodes, max_depth=5
[0]#011train-rmse:0.465408#011validation-rmse:0.463833
Multiple eval metrics have been passed: 'validation-rmse' will be used for early stopping.
Will train until validation-rmse hasn't improved in 10 rounds.
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 20 extra nodes, 36 pruned nodes, max_depth=5
[1]#011train-rmse:0.449059#011validation-rmse:0.446383
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 38 pruned nodes, max_depth=5
[2]#011train-rmse:0.441086#011validation-rmse:0.437332
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 42 pruned nodes, max_depth=5
[3]#011train-rmse:0.437256#011validation-rmse:0.433075
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 20 extra nodes, 36 pruned nodes, max_depth=5
[4]#011train-rmse:0.43452#011validation-rmse:0.430274
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 52 pruned nodes, max_depth=3
[5]#011train-rmse:0.433491#011validation-rmse:0.428857
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 42 pruned nodes, max_depth=4
[6]#011train-rmse:0.432598#011validation-rmse:0.428239
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 52 pruned nodes, max_depth=0
[7]#011train-rmse:0.432595#011validation-rmse:0.428248
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 38 pruned nodes, max_depth=5
[8]#011train-rmse:0.432083#011validation-rmse:0.427454
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 50 pruned nodes, max_depth=2
[9]#011train-rmse:0.43179#011validation-rmse:0.427258
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 40 pruned nodes, max_depth=3
[10]#011train-rmse:0.431386#011validation-rmse:0.426834
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 44 pruned nodes, max_depth=0
[11]#011train-rmse:0.431385#011validation-rmse:0.426856
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 50 pruned nodes, max_depth=1
[12]#011train-rmse:0.431218#011validation-rmse:0.426649
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 38 pruned nodes, max_depth=4
[13]#011train-rmse:0.430927#011validation-rmse:0.426312
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 52 pruned nodes, max_depth=0
[14]#011train-rmse:0.430927#011validation-rmse:0.42629
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 56 pruned nodes, max_depth=0
[15]#011train-rmse:0.43093#011validation-rmse:0.426271
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 34 pruned nodes, max_depth=0
[16]#011train-rmse:0.43093#011validation-rmse:0.426271
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[17]#011train-rmse:0.430931#011validation-rmse:0.426265
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[18]#011train-rmse:0.430929#011validation-rmse:0.426273
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 58 pruned nodes, max_depth=1
[19]#011train-rmse:0.430839#011validation-rmse:0.426269
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 44 pruned nodes, max_depth=0
[20]#011train-rmse:0.43084#011validation-rmse:0.426263
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 46 pruned nodes, max_depth=0
[21]#011train-rmse:0.430842#011validation-rmse:0.426252
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 46 pruned nodes, max_depth=0
[22]#011train-rmse:0.43084#011validation-rmse:0.426262
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 40 pruned nodes, max_depth=2
[23]#011train-rmse:0.430617#011validation-rmse:0.426239
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 46 pruned nodes, max_depth=0
[24]#011train-rmse:0.430619#011validation-rmse:0.426218
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[25]#011train-rmse:0.430617#011validation-rmse:0.426229
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 50 pruned nodes, max_depth=4
[26]#011train-rmse:0.430288#011validation-rmse:0.426333
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 54 pruned nodes, max_depth=0
[27]#011train-rmse:0.430287#011validation-rmse:0.426345
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 40 pruned nodes, max_depth=0
[28]#011train-rmse:0.430288#011validation-rmse:0.426366
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 40 pruned nodes, max_depth=0
[29]#011train-rmse:0.43029#011validation-rmse:0.426381
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 42 pruned nodes, max_depth=0
[30]#011train-rmse:0.430287#011validation-rmse:0.42634
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 36 pruned nodes, max_depth=0
[31]#011train-rmse:0.430287#011validation-rmse:0.426344
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 34 pruned nodes, max_depth=5
[32]#011train-rmse:0.429923#011validation-rmse:0.426356
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 32 pruned nodes, max_depth=0
[33]#011train-rmse:0.429918#011validation-rmse:0.426374
[23:59:21] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 42 pruned nodes, max_depth=0
[34]#011train-rmse:0.429916#011validation-rmse:0.4264
Stopping. Best iteration:
[24]#011train-rmse:0.430619#011validation-rmse:0.426218
Training seconds: 62
Billable seconds: 62
In [122]:
xgb_transformer = xgb_attached.transformer(instance_count = 1, instance_type = 'ml.m4.xlarge')
Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
Using already existing model: xgboost-220212-2350-004-7621399f
In [123]:
xgb_transformer.transform(test_location_use_case_no_offer_info, content_type='text/csv', split_type='Line')
In [124]:
xgb_transformer.wait()
.................................
Arguments: serve
[2022-02-14 12:00:15 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2022-02-14 12:00:15 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-02-14 12:00:15 +0000] [1] [INFO] Using worker: gevent
[2022-02-14 12:00:15 +0000] [22] [INFO] Booting worker with pid: 22
[2022-02-14 12:00:15 +0000] [23] [INFO] Booting worker with pid: 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)', 'urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:12:00:15:INFO] Model loaded successfully for worker : 22
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)', 'urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:12:00:15:INFO] Model loaded successfully for worker : 23
[2022-02-14 12:00:15 +0000] [24] [INFO] Booting worker with pid: 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)', 'urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14 12:00:15 +0000] [25] [INFO] Booting worker with pid: 25
[2022-02-14:12:00:15:INFO] Model loaded successfully for worker : 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)', 'urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:12:00:15:INFO] Model loaded successfully for worker : 25
[2022-02-14:12:00:20:INFO] Sniff delimiter as ','
[2022-02-14:12:00:20:INFO] Determined delimiter of CSV input is ','
2022-02-14T12:00:19.944:[sagemaker logs]: MaxConcurrentTransforms=4, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD
In [125]:
!aws s3 cp --recursive $xgb_transformer.output_path $data_dir'/discount'
download: s3://sagemaker-us-east-1-218287629635/xgboost-220212-2350-004-7621399f-2022-02-14-11-54-43-714/test_use_case_no_offer_info.csv.out to data/discount/test_use_case_no_offer_info.csv.out
In [91]:
predictions_discount = pd.read_csv(os.path.join(data_dir, 'discount', 'test_use_case_no_offer_info.csv.out'), header=None)
predictions_discount.values
Out[91]:
array([[0.43810469],
       [0.78396946],
       [0.76748109],
       [0.7021699 ],
       [0.65977854],
       [0.66752064],
       [0.7021699 ],
       [0.65977854],
       [0.83544475],
       [0.67545527],
       [0.75578934],
       [0.75578934],
       [0.68892282],
       [0.74103361],
       [0.65073389],
       [0.25560528],
       [0.78396946],
       [0.37324941],
       [0.75737733],
       [0.1051168 ],
       [0.67408472],
       [0.82361948],
       [0.25304902],
       [0.67085236],
       [0.76392955],
       [0.80734247],
       [0.34444323],
       [0.67085236],
       [0.82395494],
       [0.2660901 ],
       [0.14200665],
       [0.25560528],
       [0.74262702],
       [0.2629182 ],
       [0.76252681],
       [0.80734247],
       [0.2796472 ],
       [0.78346813],
       [0.72343636],
       [0.81022251],
       [0.48268214],
       [0.47924781],
       [0.67408472],
       [0.61736888],
       [0.7546348 ],
       [0.66117913],
       [0.47972146],
       [0.34773421],
       [0.65307462],
       [0.62436587],
       [0.18152156],
       [0.7700485 ],
       [0.65685517],
       [0.30492058],
       [0.84133667],
       [0.75578934],
       [0.78641707],
       [0.44639593],
       [0.28766564],
       [0.21972728],
       [0.12916639],
       [0.78396946],
       [0.6770969 ],
       [0.16727211],
       [0.68892282],
       [0.82361948],
       [0.79913056],
       [0.19409035],
       [0.65307462],
       [0.61736888],
       [0.76252681],
       [0.37324941],
       [0.76748109],
       [0.76767534],
       [0.61016858],
       [0.65307462],
       [0.67545527],
       [0.76767534],
       [0.45996988],
       [0.19314907],
       [0.21287467],
       [0.48268214],
       [0.58243704],
       [0.74103361],
       [0.80009341],
       [0.66117913],
       [0.71306634],
       [0.23156774],
       [0.83136606],
       [0.52687448],
       [0.38354972],
       [0.36843902],
       [0.81244385],
       [0.70779443],
       [0.74002123],
       [0.82361948],
       [0.76748109],
       [0.83544475],
       [0.65977854],
       [0.78396946]])

7.3.4 Testing the informational model

In [128]:
xgb_attached = sagemaker.estimator.Estimator.attach('xgboost-220213-0029-020-84539665') # best model's details have been taken from 6.3.4 section
Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
2022-02-13 01:01:11 Starting - Preparing the instances for training
2022-02-13 01:01:11 Downloading - Downloading input data
2022-02-13 01:01:11 Training - Training image download completed. Training in progress.
2022-02-13 01:01:11 Uploading - Uploading generated training model
2022-02-13 01:01:11 Completed - Training job completedArguments: train
[2022-02-13:01:01:01:INFO] Running standalone xgboost training.
[2022-02-13:01:01:01:INFO] Setting up HPO optimized metric to be : rmse
[2022-02-13:01:01:01:INFO] File size need to be processed in the node: 0.46mb. Available memory size in the node: 8359.67mb
[2022-02-13:01:01:01:INFO] Determined delimiter of CSV input is ','
[01:01:01] S3DistributionType set as FullyReplicated
[01:01:01] 4298x15 matrix with 64470 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,
[2022-02-13:01:01:01:INFO] Determined delimiter of CSV input is ','
[01:01:01] S3DistributionType set as FullyReplicated
[01:01:01] 1726x15 matrix with 25890 entries loaded from /opt/ml/input/data/validation?format=csv&label_column=0&delimiter=,
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 34 pruned nodes, max_depth=5
[0]#011train-rmse:0.496345#011validation-rmse:0.496172
Multiple eval metrics have been passed: 'validation-rmse' will be used for early stopping.
Will train until validation-rmse hasn't improved in 10 rounds.
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 38 pruned nodes, max_depth=5
[1]#011train-rmse:0.493471#011validation-rmse:0.493349
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 42 pruned nodes, max_depth=4
[2]#011train-rmse:0.491422#011validation-rmse:0.491249
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 38 pruned nodes, max_depth=5
[3]#011train-rmse:0.489614#011validation-rmse:0.489519
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 44 pruned nodes, max_depth=2
[4]#011train-rmse:0.488548#011validation-rmse:0.488475
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 44 pruned nodes, max_depth=4
[5]#011train-rmse:0.487491#011validation-rmse:0.487387
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 40 pruned nodes, max_depth=3
[6]#011train-rmse:0.486753#011validation-rmse:0.486729
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 32 pruned nodes, max_depth=5
[7]#011train-rmse:0.485859#011validation-rmse:0.485986
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 48 pruned nodes, max_depth=1
[8]#011train-rmse:0.485608#011validation-rmse:0.485639
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 46 pruned nodes, max_depth=1
[9]#011train-rmse:0.485401#011validation-rmse:0.485344
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 50 pruned nodes, max_depth=5
[10]#011train-rmse:0.484904#011validation-rmse:0.484827
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 32 pruned nodes, max_depth=3
[11]#011train-rmse:0.484575#011validation-rmse:0.484439
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 50 pruned nodes, max_depth=0
[12]#011train-rmse:0.484575#011validation-rmse:0.484439
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2 extra nodes, 52 pruned nodes, max_depth=1
[13]#011train-rmse:0.484457#011validation-rmse:0.484248
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[14]#011train-rmse:0.484458#011validation-rmse:0.484249
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 20 pruned nodes, max_depth=5
[15]#011train-rmse:0.484022#011validation-rmse:0.48381
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4 extra nodes, 48 pruned nodes, max_depth=2
[16]#011train-rmse:0.483844#011validation-rmse:0.483719
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 44 pruned nodes, max_depth=0
[17]#011train-rmse:0.483843#011validation-rmse:0.483718
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 34 pruned nodes, max_depth=0
[18]#011train-rmse:0.483844#011validation-rmse:0.48372
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[19]#011train-rmse:0.483844#011validation-rmse:0.48372
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 46 pruned nodes, max_depth=4
[20]#011train-rmse:0.483526#011validation-rmse:0.483405
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[21]#011train-rmse:0.483526#011validation-rmse:0.483405
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[22]#011train-rmse:0.483529#011validation-rmse:0.48341
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[23]#011train-rmse:0.483531#011validation-rmse:0.483414
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[24]#011train-rmse:0.48353#011validation-rmse:0.483412
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[25]#011train-rmse:0.483529#011validation-rmse:0.48341
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 44 pruned nodes, max_depth=0
[26]#011train-rmse:0.483528#011validation-rmse:0.483409
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 38 pruned nodes, max_depth=4
[27]#011train-rmse:0.483265#011validation-rmse:0.483341
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 26 pruned nodes, max_depth=0
[28]#011train-rmse:0.483266#011validation-rmse:0.483342
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 28 pruned nodes, max_depth=0
[29]#011train-rmse:0.483263#011validation-rmse:0.483337
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 24 pruned nodes, max_depth=4
[30]#011train-rmse:0.482989#011validation-rmse:0.483138
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[31]#011train-rmse:0.482989#011validation-rmse:0.483138
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 32 pruned nodes, max_depth=0
[32]#011train-rmse:0.482989#011validation-rmse:0.483139
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 34 pruned nodes, max_depth=0
[33]#011train-rmse:0.48299#011validation-rmse:0.483141
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 40 pruned nodes, max_depth=0
[34]#011train-rmse:0.482989#011validation-rmse:0.483139
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 40 pruned nodes, max_depth=0
[35]#011train-rmse:0.48299#011validation-rmse:0.48314
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 30 pruned nodes, max_depth=0
[36]#011train-rmse:0.48299#011validation-rmse:0.483142
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 32 pruned nodes, max_depth=0
[37]#011train-rmse:0.482991#011validation-rmse:0.483143
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 50 pruned nodes, max_depth=0
[38]#011train-rmse:0.48299#011validation-rmse:0.483141
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 40 pruned nodes, max_depth=4
[39]#011train-rmse:0.482756#011validation-rmse:0.483091
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 36 pruned nodes, max_depth=0
[40]#011train-rmse:0.482756#011validation-rmse:0.483092
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 48 pruned nodes, max_depth=0
[41]#011train-rmse:0.482755#011validation-rmse:0.483091
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 38 pruned nodes, max_depth=0
[42]#011train-rmse:0.482756#011validation-rmse:0.483092
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 24 pruned nodes, max_depth=4
[43]#011train-rmse:0.482469#011validation-rmse:0.483037
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 36 pruned nodes, max_depth=0
[44]#011train-rmse:0.482469#011validation-rmse:0.483036
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 34 pruned nodes, max_depth=0
[45]#011train-rmse:0.482469#011validation-rmse:0.483036
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 42 pruned nodes, max_depth=5
[46]#011train-rmse:0.481998#011validation-rmse:0.482971
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 36 pruned nodes, max_depth=0
[47]#011train-rmse:0.481997#011validation-rmse:0.482969
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 42 pruned nodes, max_depth=0
[48]#011train-rmse:0.481998#011validation-rmse:0.48297
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 26 pruned nodes, max_depth=0
[49]#011train-rmse:0.481998#011validation-rmse:0.482971
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 26 pruned nodes, max_depth=0
[50]#011train-rmse:0.481998#011validation-rmse:0.482971
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8 extra nodes, 46 pruned nodes, max_depth=4
[51]#011train-rmse:0.481743#011validation-rmse:0.482996
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 28 pruned nodes, max_depth=0
[52]#011train-rmse:0.481744#011validation-rmse:0.482997
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 46 pruned nodes, max_depth=0
[53]#011train-rmse:0.481744#011validation-rmse:0.482998
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 32 pruned nodes, max_depth=0
[54]#011train-rmse:0.481743#011validation-rmse:0.482996
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 34 pruned nodes, max_depth=0
[55]#011train-rmse:0.481743#011validation-rmse:0.482996
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 28 pruned nodes, max_depth=0
[56]#011train-rmse:0.481743#011validation-rmse:0.482994
[01:01:01] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 30 pruned nodes, max_depth=0
[57]#011train-rmse:0.481744#011validation-rmse:0.482997
Stopping. Best iteration:
[47]#011train-rmse:0.481997#011validation-rmse:0.482969
Training seconds: 72
Billable seconds: 72
In [129]:
xgb_transformer = xgb_attached.transformer(instance_count = 1, instance_type = 'ml.m4.xlarge')
Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
Using already existing model: xgboost-220213-0029-020-84539665
In [130]:
xgb_transformer.transform(test_location_use_case_no_offer_info, content_type='text/csv', split_type='Line')
In [131]:
xgb_transformer.wait()
...................................Arguments: serve
[2022-02-14 12:09:26 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2022-02-14 12:09:26 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-02-14 12:09:26 +0000] [1] [INFO] Using worker: gevent
[2022-02-14 12:09:26 +0000] [22] [INFO] Booting worker with pid: 22
[2022-02-14 12:09:26 +0000] [23] [INFO] Booting worker with pid: 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:12:09:26:INFO] Model loaded successfully for worker : 22
Arguments: serve
[2022-02-14 12:09:26 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2022-02-14 12:09:26 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2022-02-14 12:09:26 +0000] [1] [INFO] Using worker: gevent
[2022-02-14 12:09:26 +0000] [22] [INFO] Booting worker with pid: 22
[2022-02-14 12:09:26 +0000] [23] [INFO] Booting worker with pid: 23
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:12:09:26:INFO] Model loaded successfully for worker : 22
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:12:09:26:INFO] Model loaded successfully for worker : 23
[2022-02-14 12:09:26 +0000] [24] [INFO] Booting worker with pid: 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:12:09:26:INFO] Model loaded successfully for worker : 24
[2022-02-14 12:09:26 +0000] [25] [INFO] Booting worker with pid: 25
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:12:09:27:INFO] Model loaded successfully for worker : 25
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:12:09:26:INFO] Model loaded successfully for worker : 23
[2022-02-14 12:09:26 +0000] [24] [INFO] Booting worker with pid: 24
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:12:09:26:INFO] Model loaded successfully for worker : 24
[2022-02-14 12:09:26 +0000] [25] [INFO] Booting worker with pid: 25
/opt/amazon/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:65: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/opt/amazon/lib/python3.7/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/opt/amazon/lib/python3.7/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all(subprocess=True)
[2022-02-14:12:09:27:INFO] Model loaded successfully for worker : 25
[2022-02-14:12:09:31:INFO] Sniff delimiter as ','
[2022-02-14:12:09:31:INFO] Determined delimiter of CSV input is ','
[2022-02-14:12:09:31:INFO] Sniff delimiter as ','
[2022-02-14:12:09:31:INFO] Determined delimiter of CSV input is ','
2022-02-14T12:09:31.114:[sagemaker logs]: MaxConcurrentTransforms=4, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD

In [138]:
!aws s3 cp --recursive $xgb_transformer.output_path $data_dir'/informational'
download: s3://sagemaker-us-east-1-218287629635/xgboost-220213-0029-020-84539665-2022-02-14-12-03-49-143/test_use_case_no_offer_info.csv.out to data/informational/test_use_case_no_offer_info.csv.out
In [92]:
predictions_informational = pd.read_csv(os.path.join(data_dir, 'informational', 'test_use_case_no_offer_info.csv.out'), header=None)
predictions_informational.values
Out[92]:
array([[0.51078337],
       [0.28252146],
       [0.56562251],
       [0.30578232],
       [0.47785658],
       [0.60453463],
       [0.30578232],
       [0.43814433],
       [0.46392575],
       [0.51091087],
       [0.46392575],
       [0.50692505],
       [0.31271362],
       [0.32720113],
       [0.59641832],
       [0.35842085],
       [0.33776233],
       [0.37944996],
       [0.43407702],
       [0.31703123],
       [0.43814433],
       [0.48171777],
       [0.32720113],
       [0.31703123],
       [0.32720113],
       [0.28816587],
       [0.46392575],
       [0.28890491],
       [0.45640177],
       [0.2981464 ],
       [0.28252146],
       [0.37326831],
       [0.26040557],
       [0.50692505],
       [0.47080207],
       [0.31040466],
       [0.48171777],
       [0.46392575],
       [0.59641832],
       [0.35083523],
       [0.46392575],
       [0.45764863],
       [0.60453463],
       [0.30578232],
       [0.45640177],
       [0.47478473],
       [0.46392575],
       [0.51078337],
       [0.32720113],
       [0.43438151],
       [0.31703123],
       [0.47080207],
       [0.47785658],
       [0.45848432],
       [0.35083523],
       [0.2981464 ],
       [0.48171777],
       [0.48171777],
       [0.57585615],
       [0.26163456],
       [0.34316942],
       [0.35083523],
       [0.44894522],
       [0.29598522],
       [0.26163456],
       [0.46392575],
       [0.52284443],
       [0.26163456],
       [0.31458214],
       [0.26040557],
       [0.45918307],
       [0.40440091],
       [0.5720771 ],
       [0.2981464 ],
       [0.45848432],
       [0.31458214],
       [0.56090522],
       [0.2981464 ],
       [0.48171777],
       [0.31271362],
       [0.29315028],
       [0.46392575],
       [0.29731461],
       [0.32720113],
       [0.48537156],
       [0.48570809],
       [0.4542492 ],
       [0.32720113],
       [0.39714772],
       [0.47080207],
       [0.47785658],
       [0.44894522],
       [0.4709695 ],
       [0.47080207],
       [0.47482511],
       [0.51078337],
       [0.58417535],
       [0.45080078],
       [0.44894522],
       [0.28252146]])

7.4 Use case results

Now that we have all the results from all four models for each customer, we are able to provide an output which would help the marketing or any similar team to make a final decision about the most suitable offer for each customer

Same logic can be used in day to day business, for multiple customers or just one customer

In [93]:
for i in range (X_test_use_case.shape[0]):
    print (f"{'-'*50}Customer with index {i} {'-'*50}")
    
    customer = X_test_use_case.iloc[i]
    if customer.bogo==1:
        print (f"If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: {round(predictions_all_offers.iloc[i][0],2)}")
    elif customer.discount==1:
        print (f"If the customer receives only a discount offer, the likelihood of succefull completing the offer is: {round(predictions_all_offers.iloc[i][0],2)}")
    else:
        print (f"If the customer receives only a informational offer, the likelihood of succefull completing the offer is: {round(predictions_all_offers.iloc[i][0],2)}")
    
    bogo_prediction = round(predictions_bogo.iloc[i][0]*100,2)
    discount_prediction = round(predictions_discount.iloc[i][0]*100,2)
    informational_prediction = round(predictions_informational.iloc[i][0]*100,2)

    print (f"This customer is suitable for a bogo offer with a confidence of: {bogo_prediction}%")
    print (f"This customer is suitable for a discount offer with a confidence of: {discount_prediction}%")
    print (f"This customer is suitable for a informational offer with a confidence of: {informational_prediction}%")
    
    print ('')
    if bogo_prediction>discount_prediction and bogo_prediction>informational_prediction:
        if bogo_prediction<50:
            print (f"{'*'*20} Most suitable offer for this customer is a BOGO offer, BUT THE LIKELIHOOD OF COMPLETING IS BELOW 0.5. BETTER TO SEND AN INFORMATIONAL OFFER ONLY {'*'*20} ")
        else:
            print (f"{'*'*20} Most suitable offer for this customer is a BOGO offer {'*'*20} ")
    elif discount_prediction>bogo_prediction and discount_prediction>informational_prediction:
        if discount_prediction<50:
            print (f"{'*'*20} Most suitable offer for this customer is a DISCOUNT offer, BUT THE LIKELIHOOD OF COMPLETING IS BELOW 0.5. BETTER TO SEND AN INFORMATIONAL OFFER ONLY {'*'*20} ")
        else:
            print (f"{'*'*20} Most suitable offer for this customer is a DISCOUNT offer {'*'*20} ")
    else:
        print (f"{'*'*20} Most suitable offer for this customer is a INFORMATIONAL offer {'*'*20} ")
    
    print ('')
    print ('')
--------------------------------------------------Customer with index 0 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.42
This customer is suitable for a bogo offer with a confidence of: 34.78%
This customer is suitable for a discount offer with a confidence of: 43.81%
This customer is suitable for a informational offer with a confidence of: 51.08%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 1 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.55
This customer is suitable for a bogo offer with a confidence of: 51.65%
This customer is suitable for a discount offer with a confidence of: 78.4%
This customer is suitable for a informational offer with a confidence of: 28.25%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 2 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.56
This customer is suitable for a bogo offer with a confidence of: 66.53%
This customer is suitable for a discount offer with a confidence of: 76.75%
This customer is suitable for a informational offer with a confidence of: 56.56%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 3 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.66
This customer is suitable for a bogo offer with a confidence of: 49.36%
This customer is suitable for a discount offer with a confidence of: 70.22%
This customer is suitable for a informational offer with a confidence of: 30.58%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 4 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.59
This customer is suitable for a bogo offer with a confidence of: 60.31%
This customer is suitable for a discount offer with a confidence of: 65.98%
This customer is suitable for a informational offer with a confidence of: 47.79%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 5 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.55
This customer is suitable for a bogo offer with a confidence of: 49.37%
This customer is suitable for a discount offer with a confidence of: 66.75%
This customer is suitable for a informational offer with a confidence of: 60.45%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 6 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.56
This customer is suitable for a bogo offer with a confidence of: 54.16%
This customer is suitable for a discount offer with a confidence of: 70.22%
This customer is suitable for a informational offer with a confidence of: 30.58%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 7 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.59
This customer is suitable for a bogo offer with a confidence of: 54.7%
This customer is suitable for a discount offer with a confidence of: 65.98%
This customer is suitable for a informational offer with a confidence of: 43.81%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 8 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.81
This customer is suitable for a bogo offer with a confidence of: 65.82%
This customer is suitable for a discount offer with a confidence of: 83.54%
This customer is suitable for a informational offer with a confidence of: 46.39%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 9 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.67
This customer is suitable for a bogo offer with a confidence of: 52.76%
This customer is suitable for a discount offer with a confidence of: 67.55%
This customer is suitable for a informational offer with a confidence of: 51.09%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 10 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.74
This customer is suitable for a bogo offer with a confidence of: 56.8%
This customer is suitable for a discount offer with a confidence of: 75.58%
This customer is suitable for a informational offer with a confidence of: 46.39%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 11 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.75
This customer is suitable for a bogo offer with a confidence of: 59.13%
This customer is suitable for a discount offer with a confidence of: 75.58%
This customer is suitable for a informational offer with a confidence of: 50.69%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 12 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.72
This customer is suitable for a bogo offer with a confidence of: 46.87%
This customer is suitable for a discount offer with a confidence of: 68.89%
This customer is suitable for a informational offer with a confidence of: 31.27%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 13 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.72
This customer is suitable for a bogo offer with a confidence of: 48.51%
This customer is suitable for a discount offer with a confidence of: 74.1%
This customer is suitable for a informational offer with a confidence of: 32.72%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 14 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.59
This customer is suitable for a bogo offer with a confidence of: 56.36%
This customer is suitable for a discount offer with a confidence of: 65.07%
This customer is suitable for a informational offer with a confidence of: 59.64%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 15 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.28
This customer is suitable for a bogo offer with a confidence of: 26.5%
This customer is suitable for a discount offer with a confidence of: 25.56%
This customer is suitable for a informational offer with a confidence of: 35.84%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 16 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.67
This customer is suitable for a bogo offer with a confidence of: 59.31%
This customer is suitable for a discount offer with a confidence of: 78.4%
This customer is suitable for a informational offer with a confidence of: 33.78%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 17 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.35
This customer is suitable for a bogo offer with a confidence of: 25.38%
This customer is suitable for a discount offer with a confidence of: 37.32%
This customer is suitable for a informational offer with a confidence of: 37.94%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 18 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.75
This customer is suitable for a bogo offer with a confidence of: 53.75%
This customer is suitable for a discount offer with a confidence of: 75.74%
This customer is suitable for a informational offer with a confidence of: 43.41%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 19 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.12
This customer is suitable for a bogo offer with a confidence of: 23.53%
This customer is suitable for a discount offer with a confidence of: 10.51%
This customer is suitable for a informational offer with a confidence of: 31.7%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 20 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.68
This customer is suitable for a bogo offer with a confidence of: 51.71%
This customer is suitable for a discount offer with a confidence of: 67.41%
This customer is suitable for a informational offer with a confidence of: 43.81%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 21 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.59
This customer is suitable for a bogo offer with a confidence of: 59.52%
This customer is suitable for a discount offer with a confidence of: 82.36%
This customer is suitable for a informational offer with a confidence of: 48.17%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 22 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.29
This customer is suitable for a bogo offer with a confidence of: 27.1%
This customer is suitable for a discount offer with a confidence of: 25.3%
This customer is suitable for a informational offer with a confidence of: 32.72%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 23 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.51
This customer is suitable for a bogo offer with a confidence of: 46.99%
This customer is suitable for a discount offer with a confidence of: 67.09%
This customer is suitable for a informational offer with a confidence of: 31.7%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 24 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.74
This customer is suitable for a bogo offer with a confidence of: 45.41%
This customer is suitable for a discount offer with a confidence of: 76.39%
This customer is suitable for a informational offer with a confidence of: 32.72%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 25 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.68
This customer is suitable for a bogo offer with a confidence of: 66.29%
This customer is suitable for a discount offer with a confidence of: 80.73%
This customer is suitable for a informational offer with a confidence of: 28.82%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 26 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.33
This customer is suitable for a bogo offer with a confidence of: 37.87%
This customer is suitable for a discount offer with a confidence of: 34.44%
This customer is suitable for a informational offer with a confidence of: 46.39%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 27 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.45
This customer is suitable for a bogo offer with a confidence of: 36.56%
This customer is suitable for a discount offer with a confidence of: 67.09%
This customer is suitable for a informational offer with a confidence of: 28.89%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 28 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.82
This customer is suitable for a bogo offer with a confidence of: 53.87%
This customer is suitable for a discount offer with a confidence of: 82.4%
This customer is suitable for a informational offer with a confidence of: 45.64%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 29 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.26
This customer is suitable for a bogo offer with a confidence of: 31.02%
This customer is suitable for a discount offer with a confidence of: 26.61%
This customer is suitable for a informational offer with a confidence of: 29.81%

******************** Most suitable offer for this customer is a BOGO offer, BUT THE LIKELIHOOD OF COMPLETING IS BELOW 0.5. BETTER TO SEND AN INFORMATIONAL OFFER ONLY ******************** 


--------------------------------------------------Customer with index 30 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.14
This customer is suitable for a bogo offer with a confidence of: 30.18%
This customer is suitable for a discount offer with a confidence of: 14.2%
This customer is suitable for a informational offer with a confidence of: 28.25%

******************** Most suitable offer for this customer is a BOGO offer, BUT THE LIKELIHOOD OF COMPLETING IS BELOW 0.5. BETTER TO SEND AN INFORMATIONAL OFFER ONLY ******************** 


--------------------------------------------------Customer with index 31 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.31
This customer is suitable for a bogo offer with a confidence of: 29.32%
This customer is suitable for a discount offer with a confidence of: 25.56%
This customer is suitable for a informational offer with a confidence of: 37.33%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 32 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.76
This customer is suitable for a bogo offer with a confidence of: 49.22%
This customer is suitable for a discount offer with a confidence of: 74.26%
This customer is suitable for a informational offer with a confidence of: 26.04%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 33 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.35
This customer is suitable for a bogo offer with a confidence of: 43.37%
This customer is suitable for a discount offer with a confidence of: 26.29%
This customer is suitable for a informational offer with a confidence of: 50.69%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 34 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.49
This customer is suitable for a bogo offer with a confidence of: 49.46%
This customer is suitable for a discount offer with a confidence of: 76.25%
This customer is suitable for a informational offer with a confidence of: 47.08%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 35 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.7
This customer is suitable for a bogo offer with a confidence of: 66.61%
This customer is suitable for a discount offer with a confidence of: 80.73%
This customer is suitable for a informational offer with a confidence of: 31.04%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 36 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.29
This customer is suitable for a bogo offer with a confidence of: 36.17%
This customer is suitable for a discount offer with a confidence of: 27.96%
This customer is suitable for a informational offer with a confidence of: 48.17%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 37 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.77
This customer is suitable for a bogo offer with a confidence of: 53.87%
This customer is suitable for a discount offer with a confidence of: 78.35%
This customer is suitable for a informational offer with a confidence of: 46.39%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 38 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.59
This customer is suitable for a bogo offer with a confidence of: 53.53%
This customer is suitable for a discount offer with a confidence of: 72.34%
This customer is suitable for a informational offer with a confidence of: 59.64%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 39 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.83
This customer is suitable for a bogo offer with a confidence of: 54.15%
This customer is suitable for a discount offer with a confidence of: 81.02%
This customer is suitable for a informational offer with a confidence of: 35.08%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 40 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.39
This customer is suitable for a bogo offer with a confidence of: 38.1%
This customer is suitable for a discount offer with a confidence of: 48.27%
This customer is suitable for a informational offer with a confidence of: 46.39%

******************** Most suitable offer for this customer is a DISCOUNT offer, BUT THE LIKELIHOOD OF COMPLETING IS BELOW 0.5. BETTER TO SEND AN INFORMATIONAL OFFER ONLY ******************** 


--------------------------------------------------Customer with index 41 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.49
This customer is suitable for a bogo offer with a confidence of: 41.28%
This customer is suitable for a discount offer with a confidence of: 47.92%
This customer is suitable for a informational offer with a confidence of: 45.76%

******************** Most suitable offer for this customer is a DISCOUNT offer, BUT THE LIKELIHOOD OF COMPLETING IS BELOW 0.5. BETTER TO SEND AN INFORMATIONAL OFFER ONLY ******************** 


--------------------------------------------------Customer with index 42 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.62
This customer is suitable for a bogo offer with a confidence of: 54.57%
This customer is suitable for a discount offer with a confidence of: 67.41%
This customer is suitable for a informational offer with a confidence of: 60.45%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 43 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.53
This customer is suitable for a bogo offer with a confidence of: 55.57%
This customer is suitable for a discount offer with a confidence of: 61.74%
This customer is suitable for a informational offer with a confidence of: 30.58%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 44 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.65
This customer is suitable for a bogo offer with a confidence of: 62.51%
This customer is suitable for a discount offer with a confidence of: 75.46%
This customer is suitable for a informational offer with a confidence of: 45.64%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 45 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.52
This customer is suitable for a bogo offer with a confidence of: 50.91%
This customer is suitable for a discount offer with a confidence of: 66.12%
This customer is suitable for a informational offer with a confidence of: 47.48%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 46 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.48
This customer is suitable for a bogo offer with a confidence of: 42.54%
This customer is suitable for a discount offer with a confidence of: 47.97%
This customer is suitable for a informational offer with a confidence of: 46.39%

******************** Most suitable offer for this customer is a DISCOUNT offer, BUT THE LIKELIHOOD OF COMPLETING IS BELOW 0.5. BETTER TO SEND AN INFORMATIONAL OFFER ONLY ******************** 


--------------------------------------------------Customer with index 47 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.33
This customer is suitable for a bogo offer with a confidence of: 38.76%
This customer is suitable for a discount offer with a confidence of: 34.77%
This customer is suitable for a informational offer with a confidence of: 51.08%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 48 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.51
This customer is suitable for a bogo offer with a confidence of: 48.68%
This customer is suitable for a discount offer with a confidence of: 65.31%
This customer is suitable for a informational offer with a confidence of: 32.72%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 49 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.56
This customer is suitable for a bogo offer with a confidence of: 57.16%
This customer is suitable for a discount offer with a confidence of: 62.44%
This customer is suitable for a informational offer with a confidence of: 43.44%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 50 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.18
This customer is suitable for a bogo offer with a confidence of: 25.13%
This customer is suitable for a discount offer with a confidence of: 18.15%
This customer is suitable for a informational offer with a confidence of: 31.7%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 51 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.76
This customer is suitable for a bogo offer with a confidence of: 49.38%
This customer is suitable for a discount offer with a confidence of: 77.0%
This customer is suitable for a informational offer with a confidence of: 47.08%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 52 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.65
This customer is suitable for a bogo offer with a confidence of: 48.35%
This customer is suitable for a discount offer with a confidence of: 65.69%
This customer is suitable for a informational offer with a confidence of: 47.79%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 53 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.26
This customer is suitable for a bogo offer with a confidence of: 30.36%
This customer is suitable for a discount offer with a confidence of: 30.49%
This customer is suitable for a informational offer with a confidence of: 45.85%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 54 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.58
This customer is suitable for a bogo offer with a confidence of: 55.05%
This customer is suitable for a discount offer with a confidence of: 84.13%
This customer is suitable for a informational offer with a confidence of: 35.08%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 55 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.48
This customer is suitable for a bogo offer with a confidence of: 49.91%
This customer is suitable for a discount offer with a confidence of: 75.58%
This customer is suitable for a informational offer with a confidence of: 29.81%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 56 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.66
This customer is suitable for a bogo offer with a confidence of: 61.69%
This customer is suitable for a discount offer with a confidence of: 78.64%
This customer is suitable for a informational offer with a confidence of: 48.17%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 57 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.42
This customer is suitable for a bogo offer with a confidence of: 38.79%
This customer is suitable for a discount offer with a confidence of: 44.64%
This customer is suitable for a informational offer with a confidence of: 48.17%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 58 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.35
This customer is suitable for a bogo offer with a confidence of: 46.15%
This customer is suitable for a discount offer with a confidence of: 28.77%
This customer is suitable for a informational offer with a confidence of: 57.59%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 59 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.21
This customer is suitable for a bogo offer with a confidence of: 26.01%
This customer is suitable for a discount offer with a confidence of: 21.97%
This customer is suitable for a informational offer with a confidence of: 26.16%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 60 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.14
This customer is suitable for a bogo offer with a confidence of: 20.77%
This customer is suitable for a discount offer with a confidence of: 12.92%
This customer is suitable for a informational offer with a confidence of: 34.32%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 61 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.52
This customer is suitable for a bogo offer with a confidence of: 54.37%
This customer is suitable for a discount offer with a confidence of: 78.4%
This customer is suitable for a informational offer with a confidence of: 35.08%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 62 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.68
This customer is suitable for a bogo offer with a confidence of: 53.3%
This customer is suitable for a discount offer with a confidence of: 67.71%
This customer is suitable for a informational offer with a confidence of: 44.89%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 63 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.15
This customer is suitable for a bogo offer with a confidence of: 21.14%
This customer is suitable for a discount offer with a confidence of: 16.73%
This customer is suitable for a informational offer with a confidence of: 29.6%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 64 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.71
This customer is suitable for a bogo offer with a confidence of: 46.7%
This customer is suitable for a discount offer with a confidence of: 68.89%
This customer is suitable for a informational offer with a confidence of: 26.16%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 65 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.74
This customer is suitable for a bogo offer with a confidence of: 68.94%
This customer is suitable for a discount offer with a confidence of: 82.36%
This customer is suitable for a informational offer with a confidence of: 46.39%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 66 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.68
This customer is suitable for a bogo offer with a confidence of: 69.38%
This customer is suitable for a discount offer with a confidence of: 79.91%
This customer is suitable for a informational offer with a confidence of: 52.28%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 67 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.18
This customer is suitable for a bogo offer with a confidence of: 21.43%
This customer is suitable for a discount offer with a confidence of: 19.41%
This customer is suitable for a informational offer with a confidence of: 26.16%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 68 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.55
This customer is suitable for a bogo offer with a confidence of: 53.67%
This customer is suitable for a discount offer with a confidence of: 65.31%
This customer is suitable for a informational offer with a confidence of: 31.46%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 69 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.64
This customer is suitable for a bogo offer with a confidence of: 45.91%
This customer is suitable for a discount offer with a confidence of: 61.74%
This customer is suitable for a informational offer with a confidence of: 26.04%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 70 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.78
This customer is suitable for a bogo offer with a confidence of: 47.54%
This customer is suitable for a discount offer with a confidence of: 76.25%
This customer is suitable for a informational offer with a confidence of: 45.92%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 71 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.31
This customer is suitable for a bogo offer with a confidence of: 24.99%
This customer is suitable for a discount offer with a confidence of: 37.32%
This customer is suitable for a informational offer with a confidence of: 40.44%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 72 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.59
This customer is suitable for a bogo offer with a confidence of: 63.12%
This customer is suitable for a discount offer with a confidence of: 76.75%
This customer is suitable for a informational offer with a confidence of: 57.21%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 73 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.48
This customer is suitable for a bogo offer with a confidence of: 54.26%
This customer is suitable for a discount offer with a confidence of: 76.77%
This customer is suitable for a informational offer with a confidence of: 29.81%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 74 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.62
This customer is suitable for a bogo offer with a confidence of: 44.31%
This customer is suitable for a discount offer with a confidence of: 61.02%
This customer is suitable for a informational offer with a confidence of: 45.85%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 75 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.55
This customer is suitable for a bogo offer with a confidence of: 53.67%
This customer is suitable for a discount offer with a confidence of: 65.31%
This customer is suitable for a informational offer with a confidence of: 31.46%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 76 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.55
This customer is suitable for a bogo offer with a confidence of: 56.67%
This customer is suitable for a discount offer with a confidence of: 67.55%
This customer is suitable for a informational offer with a confidence of: 56.09%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 77 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.72
This customer is suitable for a bogo offer with a confidence of: 42.46%
This customer is suitable for a discount offer with a confidence of: 76.77%
This customer is suitable for a informational offer with a confidence of: 29.81%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 78 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.48
This customer is suitable for a bogo offer with a confidence of: 39.89%
This customer is suitable for a discount offer with a confidence of: 46.0%
This customer is suitable for a informational offer with a confidence of: 48.17%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 79 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.28
This customer is suitable for a bogo offer with a confidence of: 26.18%
This customer is suitable for a discount offer with a confidence of: 19.31%
This customer is suitable for a informational offer with a confidence of: 31.27%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 80 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.2
This customer is suitable for a bogo offer with a confidence of: 31.61%
This customer is suitable for a discount offer with a confidence of: 21.29%
This customer is suitable for a informational offer with a confidence of: 29.32%

******************** Most suitable offer for this customer is a BOGO offer, BUT THE LIKELIHOOD OF COMPLETING IS BELOW 0.5. BETTER TO SEND AN INFORMATIONAL OFFER ONLY ******************** 


--------------------------------------------------Customer with index 81 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.39
This customer is suitable for a bogo offer with a confidence of: 37.31%
This customer is suitable for a discount offer with a confidence of: 48.27%
This customer is suitable for a informational offer with a confidence of: 46.39%

******************** Most suitable offer for this customer is a DISCOUNT offer, BUT THE LIKELIHOOD OF COMPLETING IS BELOW 0.5. BETTER TO SEND AN INFORMATIONAL OFFER ONLY ******************** 


--------------------------------------------------Customer with index 82 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.45
This customer is suitable for a bogo offer with a confidence of: 43.46%
This customer is suitable for a discount offer with a confidence of: 58.24%
This customer is suitable for a informational offer with a confidence of: 29.73%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 83 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.52
This customer is suitable for a bogo offer with a confidence of: 55.57%
This customer is suitable for a discount offer with a confidence of: 74.1%
This customer is suitable for a informational offer with a confidence of: 32.72%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 84 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.64
This customer is suitable for a bogo offer with a confidence of: 60.9%
This customer is suitable for a discount offer with a confidence of: 80.01%
This customer is suitable for a informational offer with a confidence of: 48.54%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 85 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.52
This customer is suitable for a bogo offer with a confidence of: 53.48%
This customer is suitable for a discount offer with a confidence of: 66.12%
This customer is suitable for a informational offer with a confidence of: 48.57%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 86 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.78
This customer is suitable for a bogo offer with a confidence of: 48.62%
This customer is suitable for a discount offer with a confidence of: 71.31%
This customer is suitable for a informational offer with a confidence of: 45.42%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 87 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.22
This customer is suitable for a bogo offer with a confidence of: 25.51%
This customer is suitable for a discount offer with a confidence of: 23.16%
This customer is suitable for a informational offer with a confidence of: 32.72%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 88 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.85
This customer is suitable for a bogo offer with a confidence of: 54.28%
This customer is suitable for a discount offer with a confidence of: 83.14%
This customer is suitable for a informational offer with a confidence of: 39.71%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 89 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.45
This customer is suitable for a bogo offer with a confidence of: 48.04%
This customer is suitable for a discount offer with a confidence of: 52.69%
This customer is suitable for a informational offer with a confidence of: 47.08%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 90 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.39
This customer is suitable for a bogo offer with a confidence of: 35.62%
This customer is suitable for a discount offer with a confidence of: 38.35%
This customer is suitable for a informational offer with a confidence of: 47.79%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 91 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.36
This customer is suitable for a bogo offer with a confidence of: 30.36%
This customer is suitable for a discount offer with a confidence of: 36.84%
This customer is suitable for a informational offer with a confidence of: 44.89%

******************** Most suitable offer for this customer is a INFORMATIONAL offer ******************** 


--------------------------------------------------Customer with index 92 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.81
This customer is suitable for a bogo offer with a confidence of: 57.98%
This customer is suitable for a discount offer with a confidence of: 81.24%
This customer is suitable for a informational offer with a confidence of: 47.1%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 93 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.61
This customer is suitable for a bogo offer with a confidence of: 57.95%
This customer is suitable for a discount offer with a confidence of: 70.78%
This customer is suitable for a informational offer with a confidence of: 47.08%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 94 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.72
This customer is suitable for a bogo offer with a confidence of: 52.41%
This customer is suitable for a discount offer with a confidence of: 74.0%
This customer is suitable for a informational offer with a confidence of: 47.48%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 95 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.71
This customer is suitable for a bogo offer with a confidence of: 69.17%
This customer is suitable for a discount offer with a confidence of: 82.36%
This customer is suitable for a informational offer with a confidence of: 51.08%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 96 --------------------------------------------------
If the customer receives only a informational offer, the likelihood of succefull completing the offer is: 0.6
This customer is suitable for a bogo offer with a confidence of: 67.3%
This customer is suitable for a discount offer with a confidence of: 76.75%
This customer is suitable for a informational offer with a confidence of: 58.42%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 97 --------------------------------------------------
If the customer receives only a discount offer, the likelihood of succefull completing the offer is: 0.82
This customer is suitable for a bogo offer with a confidence of: 62.65%
This customer is suitable for a discount offer with a confidence of: 83.54%
This customer is suitable for a informational offer with a confidence of: 45.08%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 98 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.54
This customer is suitable for a bogo offer with a confidence of: 53.3%
This customer is suitable for a discount offer with a confidence of: 65.98%
This customer is suitable for a informational offer with a confidence of: 44.89%

******************** Most suitable offer for this customer is a DISCOUNT offer ******************** 


--------------------------------------------------Customer with index 99 --------------------------------------------------
If the customer receives only a bogo offer, the likelihood of succefull completing the offer is: 0.55
This customer is suitable for a bogo offer with a confidence of: 53.89%
This customer is suitable for a discount offer with a confidence of: 78.4%
This customer is suitable for a informational offer with a confidence of: 28.25%

******************** Most suitable offer for this customer is a DISCOUNT offer ********************